Releases: pola-rs/polars
Python Polars 1.34.0-beta.3
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
- Use cargo-run to call dsl-schema script (#24607)
🛠️ Other improvements
- Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.1
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
🛠️ Other improvements
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Rust Polars 0.51.0
💥 Breaking changes
- Remove, deprecate or change eager
Exprs to be lazy compatible (#24027)
🚀 Performance improvements
- Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
- Allocate only for read items when reading Parquet with predicate (#24401)
- Don't aggregate groups for strict cast if original len (#24381)
- Allocate only for read items when reading Parquet with predicate (#24324)
- Native streaming
int_rangewithlenorcount(#24280) - Lower
arg_uniquenatively to the streaming engine (#24279) - Move unordering optimization to end (#24286)
- Do ordering simplification step after common sub-plan elimination (#24269)
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema(#24213) - Lower
arg_wherenatively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
- Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
- Lower top-k to streaming engine (#23979)
- Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)
✨ Enhancements
- Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406) - Support S3 virtual-hosted–style URI (#24405)
- Remove explicit file create for local async writes (#24358)
- Support Partitioning sinks in cloud (#24399)
- User-friendly error message on empty path expansion (#24337)
- Add Polars security policy (#24314)
- Allow pl.Expr.log to take in an expression (#24226)
- Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
cum_*as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columnsfor eager evaluation (#23821) - Add native streaming for
peaks_{min,max}(#24039) - IR graph arrows, monospace font, box nodes (#24021)
- Add
DataTypeExpr.default_value(#23973) - Lower
rleto a native streaming engine node (#23929) - Add support for
Int128to pyo3-polars (#23959) - Lower
rle_idto a native streaming node (#23894) - Pass
endpoint_urlloaded fromCredentialProviderAWStoscan/write_delta(#23812) - Dispatch
scan_icebergto native by default (#23912) - Lower
unique_countsandvalue_countsto streaming engine (#23890) - Implement
dt.days_in_monthfunction (#23119) - Fix errors on native
scan_iceberg(#23811) - Reinterpret binary data to fixed size numerical array (#22840)
- Make
rolling_mapserializable (#23848)
🐞 Bug fixes
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Replace unsafe with collect (#24494)
- Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Emit proper tuple for Log in expression nodes (#24426)
- Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417) - Correct
sink_ipcoverload for compression (#24398) - Enable all integer dtypes for
byparameter injoin_asof(#24384) - Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
- Fix incorrect output ordering for row-separable exprs (#24354)
- Fix
Series.__arrow_c_stream__for Decimal and other logical types (#24120) - Match output type to engine for
Structarithmetic (#23805) - Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
- Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
- Incorrect logic in negative streaming slice (#24326)
- Do not error on non-list
Sequenceforcolumnsparameter inread_excel(#23967) - Invalid conversion from non-bit numpy bools (#24312)
- Make
dt.epoch('s')serializable (#24302) - Make
Expr.rechunkserializable (#24303) - Schema mismatch for 'log' operation (#24300)
- Incorrect first/last aggregate in streaming engine (#24289)
- Fix group offsets in sliced groups (#24274)
- Panic in inexact date(time) conversion (#24268)
- The
index_offeature should not depends on theobjectfeature (#24256) - Keep DSL cache after serialization and deserialization (#24265)
- Sanitize and warn about eval usage (#24262)
- Unique with keep="none" in new optimization pass (#24261)
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
sliceonLiteralin agg context (#24137) - Fix incorrect
filter(lit(True))when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gatherinside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}(#24221) - Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix engine type for
concat_liston AggScalarimplode(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_inpredicate for Parquet plain strings (#24184) - Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice(#24173) - PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_byforgroup_by_dynamiccontext (#24152) - Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Resolve schema mismatch for div on Boolean (#24111)
- Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sortedfor all types (#24077) - Include datatypes in
row_encodeexpression (#24074) - Include UDF materialized type in serialization (#24073)
- Correct
.rolling()output type for non-aggregations (#24072) - Correct planner output schema for
join_asof(#24071) - Allow %B to work without specifying day (#24009)
- Correct output for
foldandreduce(#24069) - Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Datedefault to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
- Planner output type for
meanwith strange input type (#24052) - Remove, deprecate or change eager
Exprs to be lazy compatible (#24027) - Scan of multiple sources with
nulldatatype (#24065) - Categorical in nested data in row encoding (#24051)
- Missing length update in builder for pl.Array repetition (#24055)
- Race condition in global categories init (#24045)
- Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
- Error when using named functions (#24041)
- Don't encode entire CategoricalMapping when going to Arrow (#24036)
- Fix cast on arithmetic with
lit(#23941) - Incorrect slice-slice pushdown (#24032)
- Dedup common cache subplan in IR graph (#24028)
- Allow join on Decimal in in-memory engine (#24026)
- Fix datatypes for
eval.listin aggregation context (#23911) - Allocator capsule fallback panic (#24022)
- Accept another zlib "magic header" file signature (#24013)
- Fix
truedivdtypes socastinlist.evalis not dropped (#23936) - Don't reuse cached
return_dtypefor expanded map expressions (#24010) - Cache id is not a valid dot node id (#24005)
- Align
map_elementswith and withoutreturn_dtype(#24007) - Fix column dtype lifetime for
csv_writesegfault onCategorical(#23986) - Allow serializing
LazyGroupBy.map_groups(#23964) - Correct allocator name in
PyCapsule(#23968) - Mismatched types for
writefunction for windows (#23915) - Fix
unpivotpanic whenindex=column not found (#23958) - Fix
assert_frame_equalwithcheck_dtypes=Falsefor all-null series with different types (#23943) - Return correct python package version (#23951)
- Categorical namespace functions fail on
Enumcolumns (#23925) - Properly set sumwise complete on filter for missing columns (#23877)
- Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
- Group By with filters (#23917)
- Fix
read_csvignoring Decimal schema for header-only data (#23886) - Ensure
collect()native Iceberg always scans latest when nosnapshot_idis given (#23907) - Writing List(Array) columns to JSON without panic (#23875)
- Fill Iceberg missing fields with partition values if present in metadata (#23900)
- Create file for streaming sink even if unspawned (#23672)
- Update cloud testing environment (#23908)
- Parquet filtering on multiple RGs with literal predicate (#23903)
- Incorrect datatype passed to libc::write (#23904)
- Properly feature gate TZ_AWARE_RE usage (#23888)
- Improve identification of "non group-key" aggregates in SQL
GROUP BYqueries (#23191) - Spawning tokio task outside reactor (#23884)
- Correctly raise DuplicateError on asof_join with suffix="" (#23864)
- Fix errors on native
scan_iceberg(#23811) - Fix index ...
Python Polars 1.33.1
🚀 Performance improvements
- Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
- Allocate only for read items when reading Parquet with predicate (#24401)
- Don't aggregate groups for strict cast if original len (#24381)
- Allocate only for read items when reading Parquet with predicate (#24324)
✨ Enhancements
- Support S3 virtual-hosted–style URI (#24405)
- Remove explicit file create for local async writes (#24358)
- Add PyCapsule
__arrow_c_schema__interface topl.Schema(#24365) - Support Partitioning sinks in cloud (#24399)
- User-friendly error message on empty path expansion (#24337)
- Add unstable
pre_execution_queryparameter toread_database_uri(#23634) - Add Polars security policy (#24314)
🐞 Bug fixes
- Correct
sink_ipcoverload for compression (#24398) - Enable all integer dtypes for
byparameter injoin_asof(#24384) - Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
- Wrap deprecated top-level imports in TYPE_CHECKING (#24340)
- Fix incorrect output ordering for row-separable exprs (#24354)
- Fix
Series.__arrow_c_stream__for Decimal and other logical types (#24120) - Match output type to engine for
Structarithmetic (#23805) - Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
- Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
- Don't throw away type information for NumPy numeric values when using lit() (#24229)
- Incorrect logic in negative streaming slice (#24326)
- Ensure
read_database_uriwith ADBC works as expected with DuckDB URIs (#24097) - Do not error on non-list
Sequenceforcolumnsparameter inread_excel(#23967)
📖 Documentation
- Document newly added
is_pureparameter forregister_io_source(#24311) - Create a module docstring for the public
polarsmodule (#24332) - Update to Polars Cloud user guide (#24187)
- Update distributed page (#24323)
- Add a note and example about exporting unformatted
Excelsheet data (#24145) - Add detail about server-side cursor behaviour for SQLAlchemy in the "iter_batches" parameter of
read_database(#24094) - Add Polars security policy (#24314)
🛠️ Other improvements
- Bump c-api (#24412)
- Add a regression test for #7631 (#24363)
- Update cloud test
InteractiveQuerytoDirectQuery(#24287) - Mark some tests as slow (#24327)
- Mark more tests as ready for cloud (#24315)
- Add hint to update
PYPOLARS_VERSIONon version assert test (#24313)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @VictorAtIfInsurance, @alexander-beedie, @coastalwhite, @dsprenkels, @itamarst, @kdn36, @kuril, @mcrumiller, @nameexhaustion, @nesb1, @orlp, @r-brink and @ritchie46
Python Polars 1.33.0
💥 Breaking changes
- Remove, deprecate or change eager
Exprs to be lazy compatible (#24027)
🚀 Performance improvements
- Native streaming
int_rangewithlenorcount(#24280) - Lower
arg_uniquenatively to the streaming engine (#24279) - Move unordering optimization to end (#24286)
- Do ordering simplification step after common sub-plan elimination (#24269)
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema(#24213) - Lower
arg_wherenatively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
✨ Enhancements
- Add CSE for custom io sources using pointer for hashing (#24297)
- Allow pl.Expr.log to take in an expression (#24226)
- Add caching to user credential providers (#23789)
- Expose
mkdirparameter onwrite_parquet(#24239) - Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Drop PyArrow requirement for
write_databasewith the ADBC engine (#24136) - Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
LazyFrame.pipe_with_schema(#24075) - Catch additional temporal attributes in
BytecodeParserfunction analysis (#24076) - Add
cum_*as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columnsfor eager evaluation (#23821)
🐞 Bug fixes
- Invalid conversion from non-bit numpy bools (#24312)
- Make
dt.epoch('s')serializable (#24302) - Make
Expr.rechunkserializable (#24303) - Schema mismatch for 'log' operation (#24300)
- Incorrect first/last aggregate in streaming engine (#24289)
- Fix group offsets in sliced groups (#24274)
- Panic in inexact date(time) conversion (#24268)
- Keep DSL cache after serialization and deserialization (#24265)
- Sanitize and warn about eval usage (#24262)
- Correct incorrect default in
from_pandasoverload forinclude_index(#24258) - Unique with keep="none" in new optimization pass (#24261)
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
sliceonLiteralin agg context (#24137) - Fix incorrect
filter(lit(True))when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gatherinside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}(#24221) - Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix credential provider did not auto-init on partition sinks (#24188)
- Fix engine type for
concat_liston AggScalarimplode(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_inpredicate for Parquet plain strings (#24184) - Support native DuckDB connection in read_database (#24177)
- Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice(#24173) - PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_byforgroup_by_dynamiccontext (#24152) - Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Fix mismatched pytest test collection error (#24133)
- Resolve schema mismatch for div on Boolean (#24111)
- Fix from_repr parsing of negative durations (#24115)
- Make
group_by/partition_byiterator keystuple[Any, ...]to enable tuple-unpacking (#24113) - Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sortedfor all types (#24077) - Include datatypes in
row_encodeexpression (#24074) - Include UDF materialized type in serialization (#24073)
- Correct
.rolling()output type for non-aggregations (#24072) - Correct planner output schema for
join_asof(#24071) - Correct output for
foldandreduce(#24069) - Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Datedefault to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
- Planner output type for
meanwith strange input type (#24052) - Remove, deprecate or change eager
Exprs to be lazy compatible (#24027)
📖 Documentation
- Fix few typos (#24305)
- Add missing reference to
LazyFrame.pipe_with_schema()on the website (#24285) - Automatically register
doctest.ELLIPSISso we don't have to add the inline directive each time (#24146) - Update categorical comparison documentation in user guide (#24249)
- Add missing references for
Seriers.rolling_*_bymethods (#24254) - Fix formatting of Series.value_counts examples (#24245)
- Add hint to use
DataFrame/Seriesconstructors infrom_arrowdocstring (#22942) - Update GPU un/supported features (#24195)
- Add
DataFrame.map_columnsto API (#24128) - Update multiple pages in the Polars Cloud user guide (#23661)
- Fix
str.find_many()docstring example (#24092)
📦 Build system
🛠️ Other improvements
- Remove PDS-H code (#24301)
- Get ready for even more cloud tests (#24292)
- Add tests for slices with caches (#24288)
- Readd ordering tests (#24284)
- Fix Makefile venv path (#24251)
- Remove unnecessary parentheses (#24244)
- Make non-nested shift{,_and_fill} ops generic (#24224)
- Remove unused
Wrap(#24214) - Allow upcasting null-typed columns to nested column types in scans (#24185)
- Automatically label a few more types of PR (#24147)
- Update toolchain (#24156)
- Add
order_sensitiveproperty forAExpr(#24116) - Mark more tests as not possible on cloud (#24103)
- Turn
AggExpr::Countfrom tuple to struct (#24096) - Mark tests that may fail in cloud (#24067)
- Extend read database tests to capture more ADBC functionality (#24002)
- Make CI perf failures more lenient (#24066)
- Fix hive partition string encoding in CI by upgrading
deltalake(#24018) - Make tests with sinks run on cloud again (#24048)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @NeejWeej, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-
Python Polars 1.33.0-beta.1
💥 Breaking changes
- Remove, deprecate or change eager
Exprs to be lazy compatible (#24027)
🚀 Performance improvements
- Always simplify order requirements in IR (#24192)
- Basic de-duplication of filter expressions (#24220)
- Cache the IR in
pipe_with_schema(#24213) - Lower
arg_wherenatively to streaming engine (#24088) - Lower Expr.shift to streaming engine (#24106)
- Lower order-preserving groupby to streaming engine (#24053)
✨ Enhancements
- Allow pl.Expr.log to take in an expression (#24226)
- Add caching to user credential providers (#23789)
- Expose
mkdirparameter onwrite_parquet(#24239) - Implement diff() in streaming engine (#24189)
- Enable Expr.diff(n) for negative n (#24200)
- Allow upcasting null-typed columns to nested column types in scans (#24185)
- Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
- Drop PyArrow requirement for
write_databasewith the ADBC engine (#24136) - Add a deprecation warning for pl.Series.shift(Null) (#24114)
- Improve Debug formatting of DataType (#24056)
- Add
LazyFrame.pipe_with_schema(#24075) - Catch additional temporal attributes in
BytecodeParserfunction analysis (#24076) - Add
cum_*as native streaming nodes (#23977) - Add peak_{min,max} support for booleans (#24068)
- Add
DataFrame.map_columnsfor eager evaluation (#23821)
🐞 Bug fixes
- Correct size limits for Decimal cast (#24252)
- Unordered unions in check order observing pass (#24253)
- Fix dtype for
sliceonLiteralin agg context (#24137) - Fix incorrect
filter(lit(True))when scanning hive (#24237) - In-memory group_by on 128-bit integers (#24242)
- Fix panic in
gatherinside groupby with invalid indices (#24182) - Release the GIL in map_groups (#24225)
- Remove extra explode in
LazyGroupBy.{head,tail}(#24221) - Fix panic in polars cloud CSV scan (#24197)
- Fix panic when loading categorical columns from IO plugin (#24205)
- Fix credential provider did not auto-init on partition sinks (#24188)
- Fix engine type for
concat_liston AggScalarimplode(#24160) - Rolling_mean handle centered weights with len(values) < window_size (#24158)
- Reading
is_inpredicate for Parquet plain strings (#24184) - Support native DuckDB connection in read_database (#24177)
- Make PyCategories pickleable (#24170)
- Remove unused unsound function
to_mutable_slice(#24173) - PyO3 extension types giving compat_level errors (#24166)
- Allow non-elementwise by in top_k (#24164)
- Fix
sort_byforgroup_by_dynamiccontext (#24152) - Input-independent length aggregations in streaming (#24153)
- Release GIL when iterating df in to_arrow (#24151)
- Respect non-elementwise join_where conditions (#24135)
- Fix mismatched pytest test collection error (#24133)
- Resolve schema mismatch for div on Boolean (#24111)
- Fix from_repr parsing of negative durations (#24115)
- Make
group_by/partition_byiterator keystuple[Any, ...]to enable tuple-unpacking (#24113) - Keep name when doing empty group-aware aggregation (#24098)
- Implode instead of
reshape_list(#24078) - Rolling mean with weights incorrect when min_samples < window_size (#23485)
- Allow
merge_sortedfor all types (#24077) - Include datatypes in
row_encodeexpression (#24074) - Include UDF materialized type in serialization (#24073)
- Correct
.rolling()output type for non-aggregations (#24072) - Correct planner output schema for
join_asof(#24071) - Correct output for
foldandreduce(#24069) - Expr.meta.output_name for struct fields (#24064)
- Ensure upcast operations on
pl.Datedefault to microsecond precision (#23981) - Add peak_{min,max} support for booleans (#24068)
- Planner output type for
meanwith strange input type (#24052) - Remove, deprecate or change eager
Exprs to be lazy compatible (#24027)
📖 Documentation
- Fix formatting of Series.value_counts examples (#24245)
- Add hint to use
DataFrame/Seriesconstructors infrom_arrowdocstring (#22942) - Update GPU un/supported features (#24195)
- Add
DataFrame.map_columnsto API (#24128) - Update multiple pages in the Polars Cloud user guide (#23661)
- Fix
str.find_many()docstring example (#24092)
📦 Build system
- Drop binary support for macos_x86-64 (#24257)
🛠️ Other improvements
- Remove unnecessary parentheses (#24244)
- Make non-nested shift{,_and_fill} ops generic (#24224)
- Remove unused
Wrap(#24214) - Allow upcasting null-typed columns to nested column types in scans (#24185)
- Automatically label a few more types of PR (#24147)
- Update toolchain (#24156)
- Add
order_sensitiveproperty forAExpr(#24116) - Mark more tests as not possible on cloud (#24103)
- Turn
AggExpr::Countfrom tuple to struct (#24096) - Mark tests that may fail in cloud (#24067)
- Extend read database tests to capture more ADBC functionality (#24002)
- Make CI perf failures more lenient (#24066)
- Fix hive partition string encoding in CI by upgrading
deltalake(#24018) - Make tests with sinks run on cloud again (#24048)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-
Python Polars 1.32.3
🚀 Performance improvements
- Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
- Lower top-k to streaming engine (#23979)
- Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)
✨ Enhancements
- Add native streaming for
peaks_{min,max}(#24039) - IR graph arrows, monospace font, box nodes (#24021)
- Add
DataTypeExpr.default_value(#23973) - Lower
rleto a native streaming engine node (#23929) - Add support for
Int128to pyo3-polars (#23959)
🐞 Bug fixes
- Scan of multiple sources with
nulldatatype (#24065) - Categorical in nested data in row encoding (#24051)
- Missing length update in builder for pl.Array repetition (#24055)
- Race condition in global categories init (#24045)
- Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
- Error when using named functions (#24041)
- Don't encode entire CategoricalMapping when going to Arrow (#24036)
- Fix cast on arithmetic with
lit(#23941) - Incorrect slice-slice pushdown (#24032)
- Dedup common cache subplan in IR graph (#24028)
- Allow join on Decimal in in-memory engine (#24026)
- Fix datatypes for
eval.listin aggregation context (#23911) - Allocator capsule fallback panic (#24022)
- Accept another zlib "magic header" file signature (#24013)
- Fix
truedivdtypes socastinlist.evalis not dropped (#23936) - Don't reuse cached
return_dtypefor expanded map expressions (#24010) - Cache id is not a valid dot node id (#24005)
- Align
map_elementswith and withoutreturn_dtype(#24007) - Fix column dtype lifetime for
csv_writesegfault onCategorical(#23986) - Allow serializing
LazyGroupBy.map_groups(#23964) - Correct allocator name in
PyCapsule(#23968) - Mismatched types for
writefunction for windows (#23915) - Fix
unpivotpanic whenindex=column not found (#23958)
📖 Documentation
- Fix a typo in "lazy/execution" user-guide page (#23983)
🛠️ Other improvements
- Update pyo3-polars versions (#24031)
- Remove insert_error_function (#24023)
- Remove cache hits, clean up in-mem prefill (#24019)
- Use .venv instead of venv in pyo3-polars examples (#24024)
- Fix test failing mypy (#24017)
- Remove outdated comment (#23998)
- Add a
_plr.pyito removemypyissues (#23970) - Don't define CountStar as dyn OptimizationRule (#23976)
- Rename
atolandrtoltoabs_tolandrel_tol(#23961) - Introduce
Row{Encode,Decode}as FunctionExpr (#23933) - Dispatch through
pl.map_batchesandAnonymousColumnsUdf(#23867)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @borchero, @cmdlineluser, @coastalwhite, @iishutov, @jarondl, @kdn36, @orlp, @rawhuul, @ritchie46 and @stijnherfst
Python Polars 1.32.2
🐞 Bug fixes
- Return correct python package version (#23951)
📖 Documentation
- Add
arr.len()on the website (#23944)
Thank you to all our contributors for making this release possible!
@coastalwhite and @etiennebacher
Python Polars 1.32.1
🚀 Performance improvements
- Optimise
BytecodeParserusage fromwarn_on_inefficient_map(#23809) - Lower extend_constant to the streaming engine (#23824)
- Lower pl.repeat to streaming engine (#23804)
- Remove redundant clone (#23771)
✨ Enhancements
- Lower
rle_idto a native streaming node (#23894) - Pass
endpoint_urlloaded fromCredentialProviderAWStoscan/write_delta(#23812) - Dispatch
scan_icebergto native by default (#23912) - Lower
unique_countsandvalue_countsto streaming engine (#23890) - Support initializing from
__arrow_c_schema__protocol inpl.Schema(#23879) - Better handle broken local package environment in
show_versions(#23885) - Implement
dt.days_in_monthfunction (#23119) - Making
Expr.rolling_*_bymethods available topl.Series(#23742) - Fix errors on native
scan_iceberg(#23811) - Reinterpret binary data to fixed size numerical array (#22840)
- Make
rolling_mapserializable (#23848) - Ensure
CachingCredentialProviderreturns copied credentials dict (#23817) - Change typing for
.remote()fromLazyFrameExttoLazyFrameRemote(#23825) - Implement
repeat_byforArrayandNull(#23794) - Add DeprecationWarning on passing physical ordering to Categorical (#23779)
- Pre-filtered decode and row group skipping with Iceberg / Delta / scans with cast options (#23792)
- Update
BytecodeParseropcode awareness for upcoming Python 3.14 (#23782)
🐞 Bug fixes
- Categorical namespace functions fail on
Enumcolumns (#23925) - Properly set sumwise complete on filter for missing columns (#23877)
- Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
- Group By with filters (#23917)
- Fix
read_csvignoring Decimal schema for header-only data (#23886) - Ensure
collect()native Iceberg always scans latest when nosnapshot_idis given (#23907) - Writing List(Array) columns to JSON without panic (#23875)
- Fill Iceberg missing fields with partition values if present in metadata (#23900)
- Create file for streaming sink even if unspawned (#23672)
- Update cloud testing environment (#23908)
- Parquet filtering on multiple RGs with literal predicate (#23903)
- Incorrect datatype passed to libc::write (#23904)
- Properly feature gate TZ_AWARE_RE usage (#23888)
- Improve identification of "non group-key" aggregates in SQL
GROUP BYqueries (#23191) - Spawning tokio task outside reactor (#23884)
- Correctly raise DuplicateError on asof_join with suffix="" (#23864)
- Fix errors on native
scan_iceberg(#23811) - Fix index out of bounds panic filtering parquet (#23850)
- Fix error on empty range requests (#23844)
- Fix handling of hive partitioning
hive_start_idxparameter (#23843) - Allow encoding of
pl.Enumwith smaller physicals (#23829) - Filter sorted flag from physical in CategoricalChunked (#23827)
- Remove accidental todo! in repeat node (#23822)
- Make
meta.popoperate onExpronly (#23808) - Stack overflow in
DslPlanserde (#23801) - Clear credentials cached in Python when rebuilding object store (#23756)
- Datetime selectors with mixed timezone info (#23774)
- Support i128 in asof join (#23770)
- Remove sleep for credential refresh (#23768)
📖 Documentation
- Improve StackOverflow links in contributing guide (#23895)
- Fix
pyo3documentation page link (#23839) - Document the pureness requirements of udfs (#23787)
- Correct the
name.*methods on their removal of aliases (#23773)
📦 Build system
- Workaround for pyiceberg
make requirementson Python 3.13 (#23810) - Add pyiceberg to dev dependencies (#23791)
🛠️ Other improvements
- Ensure
clippyandrustfmtrun in CI when changingpyo3-polars(#23930) - Fix pyo3-polars proc-macro re-exports (#23918)
- Rewrite
evaluate_on_groupsfor.gather/.get(#23700) - Move Python C API to
python-polars(#23876) - Improve/fix internal
LRUCacheimplementation and move into "_utils" module (#23813) - Relax constraint on maximum Python version for
numba(#23838) - Automatically tag PRs mentioning "SQL" with the appropriate label (#23816)
- Update
typospackage (#23818) - Fix typos path (#23803)
- Remove
deserialize_with_unknown_fields(#23802) - Add pyiceberg to dev dependencies (#23791)
- Remove old schema file (#23798)
- Mark more tests as ready for cloud (#23743)
- Reduce required deps for pyo3-polars (#23761)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @Liyixin95, @alexander-beedie, @cgevans, @cmdlineluser, @coastalwhite, @eitsupi, @gfvioli, @itamarst, @jimmmmmmmmmmmy, @kdn36, @math-hiyoko, @mcrumiller, @mpasa, @mrkn, @nameexhaustion, @orlp, @pka, @pomo-mondreganto, @ritchie46 and @stijnherfst
Rust Polars 0.50.0
🏆 Highlights
- Make
Selectora concrete part of the DSL (#23351) - Rework Categorical/Enum to use (Frozen)Categories (#23016)
🚀 Performance improvements
- Lower Expr.slice to streaming engine (#23683)
- Elide bound check (#23653)
- Preserve
Columnrepr inColumnTransformoperations (#23648) - Lower any() and all() to streaming engine (#23640)
- Lower row-separable functions in streaming engine (#23633)
- Lower int_range(len()) to with_row_index (#23576)
- Avoid double field resolution in with_columns (#23530)
- Rolling quantile lower time complexity (#23443)
- Use single-key optimization with Categorical (#23436)
- Improve null-preserving identification for boolean functions (#23317)
- Improve boolean bitwise aggregate performance (#23325)
- Enable Parquet expressions and dedup
is_invalues in Parquet predicates (#23293) - Re-write join types during filter pushdown (#23275)
- Generate PQ ZSTD decompression context once (#23200)
- Trigger cache/cse optimizations when multiplexing (#23274)
- Cache FileInfo upon DSL -> IR conversion (#23263)
- Push more filters past joins (#23240)
✨ Enhancements
- Expand on
DataTypeExpr(#23249) - Lower row-separable functions in streaming engine (#23633)
- Add scalar checks to range expressions (#23632)
- Expose
POLARS_DOT_SVG_VIEWERto automatically dispatch to SVG viewer (#23592) - Implement mean function in
arrnamespace (#23486) - Implement
vec_hashforListandArray(#23578) - Add unstable
pl.row_index()expression (#23556) - Add Categories on the Python side (#23543)
- Implement partitioned sinks for the in-memory engine (#23522)
- Expose
IRFunctionExpr::Rankin the python visitor (#23512) - Raise and Warn on UDF's without
return_dtypeset (#23353) - IR pruning (#23499)
- Expose
IRFunctionExpr::FillNullWithStrategyin the python visitor (#23479) - Support min/max reducer for null dtype in streaming engine (#23465)
- Implement streaming Categorical/Enum min/max (#23440)
- Allow cast to Categorical inside list.eval (#23432)
- Support
pathlib.Pathas source forread/scan_delta()(#23411) - Enable default set of
ScanCastOptionsfor nativescan_iceberg()(#23416) - Pass payload in
ExprRegistry(#23412) - Support reading nanosecond/Int96 timestamps and schema evolved datasets in
scan_delta()(#23398) - Support row group skipping with filters when
cast_optionsis given (#23356) - Execute bitwise reductions in streaming engine (#23321)
- Use
scan_parquet().collect_schema()forread_parquet_schema(#23359) - Add dtype to str.to_integer() (#22239)
- Add
arr.slice,arr.headandarr.tailmethods toarrnamespace (#23150) - Add
is_closemethod (#23273) - Drop superfluous casts from optimized plan (#23269)
- Added
drop_nullsoption toto_dummies(#23215) - Support comma as decimal separator for CSV write (#23238)
- Don't format keys if they're empty in dot (#23247)
- Improve arity simplification (#23242)
🐞 Bug fixes
- Fix credential refresh logic (#23730)
- Fix
to_datetime()fallible identification (#23735) - Correct output datatype for
dt.with_time_unit(#23734) - Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
- Allow DataType expressions with selectors (#23720)
- Match output type to engine for
interpolateonDecimal(#23706) - Remaining bugs in
with_exprs_and_inputand pruning (#23710) - Match output dtype to engine for
cum_sum_horizontal(#23686) - Field names for
pl.structin group-by (#23703) - Fix output for
str.extract_groupswith empty string pattern (#23698) - Match output type to engine for
rolling_map(#23702) - Fix incorrect join on single Int128 column for in-memory engine (#23694)
- Match output field name to lhs for
BusinessDaycount(#23679) - Correct the planner output datatype for
strptime(#23676) - Sort and Scan
with_exprs_and_input(#23675) - Revert to old behavior with
name.keep(#23670) - Fix panic loading from arrow
Mapcontaining timestamps (#23662) - Selectors in
selfpart oflist.eval(#23668) - Fix output field dtype for
ToInteger(#23664) - Allow
decimal_commawith,separator inread_csv(#23657) - Fix handling of UTF-8 in
write_csvtoIO[str](#23647) - Selectors in
{Lazy,Data}Frame.filter(#23631) - Stop splitfields iterator at eol in simd branch (#23652)
- Correct output datatype of dt.year and dt.mil (#23646)
- Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
- Order-preserving equi-join didn't always flush final matches (#23639)
- Fix ColumnNotFound error when joining on
col().cast()(#23622) - Fix agg groups on
when/theningroup_bycontext (#23628) - Output type for sign (#23572)
- Apply
agg_fnonnullvalues inpivot(#23586) - Remove nonsensical duration variance (#23621)
- Don't panic when sinking nested categorical to Parquet (#23610)
- Correctly set value count output field name (#23611)
- Casting unused columns in to_torch (#23606)
- Allow inferring of hours-only timezone offset (#23605)
- Bug in Categorical <-> str compare with nulls (#23609)
- Honor
n=0in all cases ofstr.replace(#23598) - Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
- Relabel duplicate sequence IDs in distributor (#23593)
- Round-trip Enum and Categorical metadata in plugins (#23588)
- Fix incorrect
join_asofwithbyfollowed byhead/slice(#23585) - Allow writing nested Int128 data to Parquet (#23580)
- Enum serialization assert (#23574)
- Output type for peak_min / peak_max (#23573)
- Make Scalar Categorical, Enum and Struct values serializable (#23565)
- Preserve row order within partition when sinking parquet (#23462)
- Panic in
create_multiple_physical_planswhen branching from a single cache node (#23561) - Prevent in-mem partition sink deadlock (#23562)
- Update AWS cloud documentation (#23563)
- Correctly handle null values when comparing structs (#23560)
- Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
- Make
Expr.appendserializable (#23515) - Float by float division dtype (#23529)
- Division on empty DataFrame generating null row (#23516)
- Partition sink
copy_exprsandwith_exprs_and_input(#23511) - Unreachable with
pl.self_dtype(#23507) - Rolling median incorrect min_samples with nulls (#23481)
- Make
Int128roundtrippable via Parquet (#23494) - Fix panic when common subplans contain IEJoins (#23487)
- Properly handle non-finite floats in rolling_sum/mean (#23482)
- Make
read_csv_batchedrespectskip_rowsandskip_lines(#23484) - Always use
cloudpicklefor the python objects in cloud plans (#23474) - Support string literals in index_of() on categoricals (#23458)
- Don't panic for
finish_callbackwith nested datatypes (#23464) - Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
- Fix var/moment dtypes (#23453)
- Fix agg_groups dtype (#23450)
- Clear cached_schema when apply changes dtype (#23439)
- Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
- Null handling in full-null group_by_dynamic mean/sum (#23435)
- Enable default set of
ScanCastOptionsfor nativescan_iceberg()(#23416) - Fix index calculation for
nearestinterpolation (#23418) - Fix compilation failure with
--no-default-featuresand--features lazy,strings(#23384) - Parse parquet footer length into unsigned integer (#23357)
- Fix incorrect results with
group_byaggregation on empty groups (#23358) - Fix boolean
min()ingroup_byaggregation (streaming) (#23344) - Respect data-model in
map_elements(#23340) - Properly join URI paths in
PlPath(#23350) - Ignore null values in
bitwiseaggregation on bools (#23324) - Fix panic filtering after left join (#23310)
- Out-of-bounds index in hot hash table (#23311)
- Fix scanning '?' from cloud with
glob=False(#23304) - Fix filters on inserted columns did not remove rows (#23303)
- Don't ignore return_dtype (#23309)
- Use safe parsing for
get_normal_components(#23284) - Fix output column names/order of streaming coalesced right-join (#23278)
- Restore
concat_arrinputs expansion (#23271)
📖 Documentation
- Point the R Polars version on R-multiverse (#23660)
- Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
- Add page about billing to Polars Cloud user guide (#23564)
- Small user-guide improvement and fixes (#23549)
- Correct note in
from_pandasabout data being cloned (#23552) - Fix a few typos in the "Streaming" section (#23536)
- Update streaming page (#23535)
- Update structure of Polars Cloud documentation (#23496)
- Update when_then in user guide (#23245)
📦 Build system
🛠️ Other improvements
- Remove incorrect
DeletionFilesList::slice(#23796) - Remove old schema file (#23798)
- Remove Default for StreamingExecutionState (#23729)
- Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
- Expose
PlPathRefvia polars::prelude (#23754) - Add hashes json (#23758)
- Add
AExpr::is_expr_equal_to(#23740) - Fix rank test to respect maintain order (#23723)
- IR inputs and exprs iterators (#23722)
- Store more granular schema hashes to reduce merge conflicts (#23709)
- Add assertions for unique ID (#23711)
- Use RelaxedCell in multiscan (#23712)
- Debug assert
ColumnTransformcast is non-strict (#23717) - Use UUID for UniqueID (#23704)
- Remove scan id (#23697)
- Propagate Iceberg physical ID schema to IR (#23671)
- Remove unused and confusing match arm (#23691)
- Remove unused
ALLOW_GROUP_AWAREflag (#23690) - Remove unused
evaluate_inline(#23687) - Remove unused field from
AggregationContext(#23685) - Remove `nod...