|
1 | 1 | ## 1. Shared library plugins for Polars |
2 | 2 |
|
3 | | -<a href="https://crates.io/crates/pyo3-polars"> |
4 | | - <img src="https://img.shields.io/crates/v/pyo3-polars.svg"/> |
5 | | - </a> |
6 | | - |
7 | | -Documentation for this functionality may also be found in the [Polars User Guide](https://docs.pola.rs/user-guide/plugins/). |
8 | | -This is new functionality and should be preferred over `2.` as this |
9 | | -will circumvent the GIL and will be the way we want to support extending polars. |
10 | | - |
11 | | -Parallelism and optimizations are managed by the default polars runtime. That runtime will call into the plugin function. |
12 | | -The plugin functions are compiled separately. |
13 | | - |
14 | | -We can therefore keep polars more lean and maybe add support for a `polars-distance`, `polars-geo`, `polars-ml`, etc. |
15 | | -Those can then have specialized expressions and don't have to worry as much for code bloat as they can be optionally installed. |
16 | | - |
17 | | -The idea is that you define an expression in another Rust crate with a proc_macro `polars_expr`. |
18 | | - |
19 | | -The macro may have one of the following attributes: |
20 | | - |
21 | | -- `output_type` -> to define the output type of that expression |
22 | | -- `output_type_func` -> to define a function that computes the output type based on input types. |
23 | | -- `output_type_func_with_kwargs` -> to define a function that computes the output type based on input types and keyword args. |
24 | | - |
25 | | -Here is an example of a `String` conversion expression that converts any string to [pig latin](https://en.wikipedia.org/wiki/Pig_Latin): |
26 | | - |
27 | | -```rust |
28 | | -fn pig_latin_str(value: &str, capitalize: bool, output: &mut String) { |
29 | | - if let Some(first_char) = value.chars().next() { |
30 | | - if capitalize { |
31 | | - for c in value.chars().skip(1).map(|char| char.to_uppercase()) { |
32 | | - write!(output, "{c}").unwrap() |
33 | | - } |
34 | | - write!(output, "AY").unwrap() |
35 | | - } else { |
36 | | - let offset = first_char.len_utf8(); |
37 | | - write!(output, "{}{}ay", &value[offset..], first_char).unwrap() |
38 | | - } |
39 | | - } |
40 | | -} |
41 | | - |
42 | | -#[derive(Deserialize)] |
43 | | -struct PigLatinKwargs { |
44 | | - capitalize: bool, |
45 | | -} |
46 | | - |
47 | | -#[polars_expr(output_type=String)] |
48 | | -fn pig_latinnify(inputs: &[Series], kwargs: PigLatinKwargs) -> PolarsResult<Series> { |
49 | | - let ca = inputs[0].str()?; |
50 | | - let out: StringChunked = |
51 | | - ca.apply_into_string_amortized(|value, output| pig_latin_str(value, kwargs.capitalize, output)); |
52 | | - Ok(out.into_series()) |
53 | | -} |
54 | | -``` |
55 | | - |
56 | | -This can then be exposed on the Python side: |
57 | | - |
58 | | -```python |
59 | | -from __future__ import annotations |
60 | | - |
61 | | -from typing import TYPE_CHECKING |
62 | | - |
63 | | -import polars as pl |
64 | | -from polars.plugins import register_plugin_function |
65 | | - |
66 | | -from expression_lib._utils import LIB |
67 | | - |
68 | | -if TYPE_CHECKING: |
69 | | - from expression_lib._typing import IntoExprColumn |
70 | | - |
71 | | - |
72 | | -def pig_latinnify(expr: IntoExprColumn, capitalize: bool = False) -> pl.Expr: |
73 | | - return register_plugin_function( |
74 | | - plugin_path=LIB, |
75 | | - args=[expr], |
76 | | - function_name="pig_latinnify", |
77 | | - is_elementwise=True, |
78 | | - kwargs={"capitalize": capitalize}, |
79 | | - ) |
80 | | -``` |
81 | | - |
82 | | -Compile/ship and then it is ready to use: |
83 | | - |
84 | | -```python |
85 | | -import polars as pl |
86 | | -from expression_lib import language |
87 | | - |
88 | | -df = pl.DataFrame({ |
89 | | - "names": ["Richard", "Alice", "Bob"], |
90 | | -}) |
91 | | - |
92 | | - |
93 | | -out = df.with_columns( |
94 | | - pig_latin = language.pig_latinnify("names") |
95 | | -) |
96 | | -``` |
97 | | - |
98 | | -Alternatively, you can [register a custom namespace](https://docs.pola.rs/py-polars/html/reference/api/polars.api.register_expr_namespace.html#polars.api.register_expr_namespace), which enables you to write: |
99 | | - |
100 | | -```python |
101 | | -out = df.with_columns( |
102 | | - pig_latin = pl.col("names").language.pig_latinnify() |
103 | | -) |
104 | | -``` |
105 | | - |
106 | | -See the full example in [example/derive_expression]: https://github.com/pola-rs/pyo3-polars/tree/main/example/derive_expression |
107 | | - |
108 | | -## 2. Pyo3 extensions for Polars |
109 | | - |
110 | | -See the `example` directory for a concrete example. Here we send a polars `DataFrame` to rust and then compute a |
111 | | -`jaccard similarity` in parallel using `rayon` and rust hash sets. |
112 | | - |
113 | | -## Run example |
114 | | - |
115 | | -`$ cd example && make install` |
116 | | -`$ venv/bin/python run.py` |
117 | | - |
118 | | -This will output: |
119 | | - |
120 | | -``` |
121 | | -shape: (2, 2) |
122 | | -┌───────────┬───────────────┐ |
123 | | -│ list_a ┆ list_b │ |
124 | | -│ --- ┆ --- │ |
125 | | -│ list[i64] ┆ list[i64] │ |
126 | | -╞═══════════╪═══════════════╡ |
127 | | -│ [1, 2, 3] ┆ [1, 2, ... 8] │ |
128 | | -│ [5, 5] ┆ [5, 1, 1] │ |
129 | | -└───────────┴───────────────┘ |
130 | | -shape: (2, 1) |
131 | | -┌─────────┐ |
132 | | -│ jaccard │ |
133 | | -│ --- │ |
134 | | -│ f64 │ |
135 | | -╞═════════╡ |
136 | | -│ 0.75 │ |
137 | | -│ 0.5 │ |
138 | | -└─────────┘ |
139 | | -``` |
140 | | - |
141 | | -## Compile for release |
142 | | - |
143 | | -`$ make install-release` |
144 | | - |
145 | | -# What to expect |
146 | | - |
147 | | -This crate offers a `PySeries` and a `PyDataFrame` which are simple wrapper around `Series` and `DataFrame`. The |
148 | | -advantage of these wrappers is that they can be converted to and from python as they implement `FromPyObject` and `IntoPy`. |
| 3 | +This project has been vendored in the main Polars repo: [https://github.com/pola-rs/polars/tree/main/pyo3-polars](https://github.com/pola-rs/polars/tree/main/pyo3-polars) |
0 commit comments