Releases · pola-rs/polars

16 May 19:06

github-actions

py-1.30.0-beta.1

103f194

Python Polars 1.30.0-beta.1 Pre-release

Pre-release

🚀 Performance improvements

Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
Add elementwise execution mode for list.eval (#22715)
Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
Add streaming cross-join node (#22581)
Switch off maintain_order in group-by followed by sort (#22492)

✨ Enhancements

Support binaryoffset in search sorted (#22786)
Add nulls_equal flag to list/arr.contains (#22773)
Implement LazyFrame.match_to_schema (#22726)
Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
Allow for .over to be called without partition_by (#22712)
Support AnyValue translation from PyMapping values (#22722)
Support optimised init from non-dict Mapping objects in from_records and frame/series constructors (#22638)
Support inference of Int128 dtype from databases that support it (#22682)
Add options to write Parquet field metadata (#22652)
Add cast_options parameter to control type casting in scan_parquet (#22617)
Allow casting List<UInt8> to Binary (#22611)
Allow setting of regex size limit using POLARS_REGEX_SIZE_LIMIT (#22651)
Support use of literal values as "other" when evaluating Series.zip_with (#22632)
Allow to read and write custom file-level parquet metadata (#21806)
Support PEP702 @deprecated decorator behaviour (#22594)
Support grouping by pl.Array (#22575)
Preserve exception type and traceback for errors raised from Python (#22561)
Use fixed-width font in streaming phys plan graph (#22540)

🐞 Bug fixes

Respect BinaryOffset metadata (#22785)
Correct the output order of PartitionByKey and PartitionParted (#22778)
Fallback to non-strict casting for deprecated casts (#22760)
Clippy on new stable version (#22771)
Handle sliced out remainder for bitmaps (#22759)
Don't merge Enum categories on append (#22765)
Fix unnest() not working on empty struct columns (#22391)
Fix the default value type in Schema init (#22589)
Correct name in unnest error message (#22740)
Provide "schema" to DataFrame, even if empty JSON (#22739)
Properly account for nulls in the is_not_nan check made in drop_nans (#22707)
Incorrect result from SQL count(*) with partition by (#22728)
Fix deadlock joining scanned tables with low thread count (#22672)
Don't allow deserializing incompatible DSL (#22644)
Incorrect null dtype from binary ops in empty group_by (#22721)
Don't mark str.replace_many with Mapping as deprecated (#22697)
Gzip has maximum compression of 9, not 10 (#22685)
Fix predicate pushdown of fallible expressions (#22669)
Fix index out of bounds panic when scanning hugging face (#22661)
Panic on group_by with literal and empty rows (#22621)
Return input instead of panicking if empty subset in drop_nulls() and drop_nans() (#22469)
Bump argminmax to 0.6.3 (#22649)
DSL version deserialization endianness (#22642)
Allow Expr.round() to be called on integer dtypes (#22622)
Fix panic when filtering based on row index column in parquet (#22616)
WASM and PyOdide compile (#22613)
Resolve get() SchemaMismatch panic (#22350)
Panic in group_by_dynamic on single-row df with group_by (#22597)
Add new_streaming feature to polars crate (#22601)
Consistently use Unix epoch as origin for dt.truncate (except weekly buckets which start on Mondays) (#22592)
Fix interpolate on dtype Decimal (#22541)
CSV count rows skipped last line if file did not end with newline (#22577)
Make nested strict casting actually strict (#22497)
Make replace and replace_strict mapping use list literals (#22566)
Allow pivot on Time column (#22550)
Fix error when providing CSV schema with extra columns (#22544)
Panic on bitwise op between Series and Expr (#22527)
Multi-selector regex expansion (#22542)

📖 Documentation

Fix broken link to service account page in Polars Cloud docs (#22762)
Add match_to_schema to API reference (#22777)
Provide additional explanation and examples for the value_counts "normalize" parameter (#22756)
Rework documentation for drop/fill for nulls/nans (#22657)
Add documentation to new RoundMode parameter in round (#22555)
Add missing repeat_by to API reference, fixup list.get (#22698)
Fix non-rendering bullet points in scan_iceberg (#22694)
Improve insert_column docstring (description and examples) (#22551)
Improve join documentation (#22556)

🛠️ Other improvements

Update cloud docs (#22624)
Fix unstable list.eval performance test (#22729)
Add proptest implementations for all Array types (#22711)
Dispatch .write_* to .lazy().sink_*(engine='in-memory') (#22582)
Move to all optimization flags to QueryOptFlags (#22680)
Add test for str.replace_many (#22615)
Stabilize sink_* (#22643)
Add proptest for row-encode (#22626)
Update rust version in nix flake (#22627)
Add a nix flake with a devShell and package (#22246)
Use a wrapper struct to store time zone (#22523)
Add proptest testing for for parquet decoding kernels (#22608)
Include equiprobable as valid quantile method (#22571)
Remove confusing error context calling .collect(_eager=True) (#22602)
Fix test_truncate_path test case (#22598)
Unify function flags into 1 bitset (#22573)
Display the operation behind in-memory-map (#22552)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Julian-J-S, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-

Contributors

orlp, wence-, and 21 other contributors

Assets 4

05 May 13:13

github-actions

rs-0.47.0

ba3be4e

Rust Polars 0.47.1

🏆 Highlights

Enable common subplan elimination across plans in collect_all (#21747)
Add lazy sinks (#21733)
Add PartitionByKey for new streaming sinks (#21689)
Enable new streaming memory sinks by default (#21589)

💥 Breaking changes

Make bottom interval closed in hist (#22090)

🚀 Performance improvements

Avoid alloc_zeroed in decompression (#22460)
Lower Expr.(n_)unique to group_by on streaming engine (#22420)
Chunk huge munmap calls (#22414)
Add single-key variants of streaming group_by (#22409)
Improve accumulate_dataframes_vertical performance (#22399)
Use optimize rolling_quantile with varying window sizes (#22353)
Dedicated rolling_skew kernel (#22333)
Call large munmap's in background thread (#22329)
New streaming group_by implementation (#22285)
Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
Turn on parallel=prefiltered by default for new streaming (#22190)
Add CSE to streaming groupby (#22196)
Speed-up new streaming predicate filtering (#22179)
Speedup new-streaming file row count (#22169)
Fix quadratic behavior when casting Enums (#22008)
Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
Fast path for empty inner join (#21965)
Add native semi/anti join in new streaming engine (#21937)
Cache regex compilation globally (#21929)
Use views for binary hash tables and add single-key binary variant (#21872)
Avoid rechunking in gather (#21876)
Switch ahash for foldhash (#21852)
Put THP behind feature flag (#21853)
Enable THP by default (#21829)
Improve join performance for expanding joins (#21821)
Use binary_search instead of contains in business-day functions (#21775)
Implement linear-time rolling_min/max (#21770)
Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
Enable common subplan elimination across plans in collect_all (#21747)
Allow elementwise functions in recursive lowering (#21653)
Add primitive single-key hashtable to new-streaming join (#21712)
Remove unnecessary black_boxes in Kahan summation (#21679)
Box large enum variants (#21657)
Improve join performance for new-streaming engine (#21620)
Pre-fill caches (#21646)
Optimize only a single cache input (#21644)
Collect parquet statistics in one contiguous buffer (#21632)
Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
Don't maintain order when maintain_order=False in new streaming sinks (#21586)
Pre-sort groups in group-by-dynamic (#21569)
Provide a fallback skip batch predicate for constant batches (#21477)
Parallelize the passing in new streaming multiscan (#21430)
Toggle projection pushdown for eager rolling (#21405)
Fix pathologic rolling + group-by performance and memory explosion (#21403)
Add sampling to new-streaming equi join to decide between build/probe side (#21197)
Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
Implement native Expr.count() on new-streaming (#21126)
Speed up list operations that use amortized_iter() (#20964)
Use Cow as output for rechunk and add rechunk_mut (#21116)
Reduce arrow slice mmap overhead (#21113)
Reduce conversion cost in chunked string gather (#21112)
Enable prefiltered by default for new streaming (#21109)
Enable parquet column expressions for streaming (#21101)
Deduplicate buffers again in stringview concat kernel (#21098)
Add dedicated concatenate kernels (#21080)
Rechunk only once during join probe gather (#21072)
Speed up from_pandas when converting frame with multi-index columns (#21063)
Change default memory prefetch to MADV_WILLNEED (#21056)
Remove cast to boolean after comparison in optimizer (#21022)
Split last rowgroup among all threads in new-streaming parquet reader (#21027)
Recombine into larger morsels in new-streaming join (#21008)
Improve list.min and list.max performance for logical types (#20972)
Ensure count query select minimal columns (#20923)

✨ Enhancements

Support grouping by pl.Array (#22575)
Preserve exception type and traceback for errors raised from Python (#22561)
Use fixed-width font in streaming phys plan graph (#22540)
Highlight nodes in streaming phys plan graph (#22535)
Support BinaryOffset serde (#22528)
Show physical stage graph (#22491)
Add structure for dispatching iceberg to native scans (#22405)
Add SQL support for checking array values with IN and NOT IN expressions (#22487)
Add more IRBuilder utils (#22482)
Support DataFrame and Series init from torch Tensor objects (#22177)
Add RoundMode for Decimal and Float (#22248)
Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
Make streaming dispatch public (#22347)
Add rolling_kurtosis (#22335)
Support Cast in IO plugin predicates (#22317)
Add .sort(nulls_last=True) to booleans, categoricals and enums (#22300)
Add rolling min/max for temporals (#22271)
Support literal:list agg (#22249)
Support implode + agg (#22230)
Dispatch scans to new-streaming by default (#22153)
Improved expression autocomplete for IPython, Jupyter, and Marimo (#22221)
Expose FunctionIR::FastCount in the python visitor (#22195)
Add SPLIT_PART string function to the SQL interface (#22158)
Allow scalar expr in Expr.diff (#22142)
Support additional unsigned int aliases in the SQL interface (#22127)
Add STRING_TO_ARRAY function to the SQL interface (#22129)
Add dt.is_business_day (#21776)
Add support for Int128 parsing/recognition to the SQL interface (#22104)
Allow sinking to abstract python io and fs classes (#21987)
Add add_alp_optimize_exprs to IRBuilder (#22061)
Add cat.slice (#21971)
Support growing schema if line lenght increases during csv schema inference (#21979)
Replace thread unsafe GilOnceCell with Mutex (#21927)
Support modified dsl in file cache (#21907)
Add support for io-plugins in new-streaming (#21870)
Add PartitionParted (#21788)
Add DoubleEndedIterator for CatIter (#21816)
Minor improvements to EXPLAIN plan output (#21822)
Add polars_testing folder with relevant files and add_series_equal!() functionality (#21722)
Allow to use repeat_by with (nested) lists and structs (#21206)
Add support for rolling_(sum/min/max) for booleans through casting (#21748)
Support multi-column sort for all nested types and nested search-sorted (#21743)
Add lazy sinks (#21733)
Add PartitionByKey for new streaming sinks (#21689)
Fix replace flags (#21731)
Add mkdir flag to sinks (#21717)
Enable joins on list/array dtypes (#21687)
Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
Support all elementwise functions in IO plugin predicates (#21705)
Stabilize Enum datatype (#21686)
Support Polars int128 in from arrow (#21688)
Use FFI to read dataframe instead of transmute (#21673)
Enable new streaming memory sinks by default (#21589)
Cloud support for new-streaming scans and sinks (#21621)
Add len method to arr (#21618)
Closeable files on unix (#21588)
Add new PartitionMaxSize sink (#21573)
Implement unpack_dtypes() functionality with unit tests (#21574)
Support engine callback for LazyFrame.profile (#21534)
Dispatch new-streaming CSV negative slice to separate node (#21579)
Add NDJSON source to new streaming engine (#21562)
Add lossy decoding to read_csv for non-utf8 encodings (#21433)
Add 'nulls_equal' parameter to is_in (#21426)
Improve numeric stability rolling_{std, var, cov, corr} (#21528)
IR Serde cross-filter (#21488)
Support writing Time type in json (#21454)
Activate all optimizations in sinks (#21462)
Add AssertionError variant to PolarsError in polars-error (#21460)
Pass filter to inner readers in multiscan new streaming (#21436)
Implement i128 -> str cast (#21411)
Version DSL (#21383)
Make user facing binary formats mostly self describing (#21380)
Filter hive files using predicates in new streaming (#21372)
Add negative slicing to new streaming multiscan (#21219)
Pub-licize Expr DSL Function enums (#20421)
Implement sorted flags for struct series (#21290)
Support reading arrow Map type from Delta (#21330)
Add a dedicated remove method for DataFrame and LazyFrame (#21259)
Expose include_file_paths to python visitor (#21279)
Implement merge_sorted for struct (#21205)
Add positive slice for new streaming MultiScan (#21191)
Don't take in rewriting visitor (#21212)
Add SQL support for the DELETE statement (#21190)
Add row index to new streaming multiscan (#21169)
Improve DataFrame fmt in explain (#21158)
Add projection pushdown to new streaming multiscan (#21139)
Implement join on struct dtype (#21093)
Use unique temporary directory path per user and restrict permissions (#21125)
Enable new streaming multiscan for CSV (#21124)
Environment POLARS_MAX_CONCURRENT_SCANS in multiscan for new streaming (#21127)
Multi/Hive scans in new streaming engine (#21011)
Add linear_spaces (#20941)
Implement merge_sorted for binary (#21045)
Hold string cache in new streaming engine and fix row-encoding (#21039)
Support max/min method for Time dtype (#19815)
Implement a streaming merge sorted node (#20960)
Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
Add negative slice support to new-streaming engine (#21001)
Allow for more RG skipping by rewriting expr in planner (#20828)
Rename catalog schema to namespace (#20993)
Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
Improved support for KeyboardInterrupts (#20961...

Contributors

orlp, GaelVaroquaux, and 72 other contributors

Assets 2

30 Apr 20:57

github-actions

py-1.29.0

a0e3e38

Python Polars 1.29.0

🚀 Performance improvements

Avoid alloc_zeroed in decompression (#22460)

✨ Enhancements

Highlight nodes in streaming phys plan graph (#22535)
Show physical stage graph (#22491)
Add structure for dispatching iceberg to native scans (#22405)
Add SQL support for checking array values with IN and NOT IN expressions (#22487)
Support DataFrame and Series init from torch Tensor objects (#22177)
Add RoundMode for Decimal and Float (#22248)
Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)

🐞 Bug fixes

Streaming outer join coalesce bug (#22530)
Remove redundant print statement in assert_frame_schema_equal() (#22529)
Bug in .unique() followed by .slice() (#22471)
Fix error reading parquet with datetimes written by pandas (#22524)
Fix schema_overrides not taking effect in NDJSON (#22521)
Fold flags and verify scalar correctness in apply (#22519)
Invalid values were triggering panics instead of returning null in dt.to_date / dt.to_datetime (#22500)
Ensure numpy isinstance check is lazy (avoid forcing the dependency) (#22486)
Incorrectly dropped sort after unique for some queries (#22489)
Fix incorrect ternary agg state with mixed columns and scalars (#22496)
Make replace and replace_strict properly elementwise (#22465)
Fix index out of bounds panic on parquet prefiltering (#22458)
Integer underflow when checking parquet UTF-8 (#22472)
Add implementation for array.get with idx overflow (#22449)
Deprecate str. collection functions with flat strings and mark as elementwise (#22461)
Deprecate flat list.gather and mark as elementwise (#22456)
Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)

📖 Documentation

Fix typo in structs page (#22504)

🛠️ Other improvements

Don't store name/dtype in grouper (#22525)
Add structure for dispatching iceberg to native scans (#22405)
Remove unused reduction code (#22462)
Pin to explicit macOS version in code coverage (#22432)

Thank you to all our contributors for making this release possible!
@AH-Merii, @JakubValtar, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @alexander-beedie, @brianmakesthings, @coastalwhite, @nameexhaustion, @orlp and @ritchie46

Contributors

orlp, alexander-beedie, and 11 other contributors

Assets 4

27 Apr 15:33

github-actions

py-1.28.1

506319e

Python Polars 1.28.1

🐞 Bug fixes

Reading of reencoded categorical in Parquet (#22436)
Last thread in parquet predicate filter oob (#22429)

📖 Documentation

Fix a few typos in the new "multiplexing" page (#22434)
Add multiplexing page (#22426)

📦 Build system

Update pyo3 and numpy crates to version 0.24 (#22015)

🛠️ Other improvements

Add test for implode + over (#22437)
Fix CI by removing use_legacy_dataset (#22438)
Only use pytorch index-url for pytorch package (#22355)

Thank you to all our contributors for making this release possible!
@bschoenmaeckers, @coastalwhite, @etiennebacher, @mcrumiller and @ritchie46

Contributors

mcrumiller, ritchie46, and 3 other contributors

Assets 4

26 Apr 09:02

github-actions

py-1.28.0

8d30e79

Python Polars 1.28.0

🚀 Performance improvements

Lower Expr.(n_)unique to group_by on streaming engine (#22420)
Chunk huge munmap calls (#22414)
Add single-key variants of streaming group_by (#22409)
Improve accumulate_dataframes_vertical performance (#22399)
Use optimize rolling_quantile with varying window sizes (#22353)
Dedicated rolling_skew kernel (#22333)
Call large munmap's in background thread (#22329)
New streaming group_by implementation (#22285)
Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
Turn on parallel=prefiltered by default for new streaming (#22190)

✨ Enhancements

When reporting unexpected types in errors, module-qualify the typename (#22390)
Add Series backward_fill / forward_fill (#22360)
Add GPU support to sink_* APIs (#20940)
Changed mapping type from dict to Mapping (#19400) (#19436)
Make streaming dispatch public (#22347)
Add rolling_kurtosis (#22335)
Support Cast in IO plugin predicates (#22317)
Add .sort(nulls_last=True) to booleans, categoricals and enums (#22300)
Add rolling min/max for temporals (#22271)
Support literal:list agg (#22249)
Support running Polars SQL queries against any objects implementing the PyCapsule interface (#22235)
Support implode + agg (#22230)
Dispatch scans to new-streaming by default (#22153)

🐞 Bug fixes

Ensure write_excel correctly preserves null values in nested dtype data on export (#22379)
Panic when visualizing streaming physical plan with joins (#22404)
Fix incorrect filter after LazyFrame.rename().select() (#22380)
Fix select(len()) performance regression (#22363)
Handle pytz named timezone in lit (#21785)
Don't leak state during prefill CSE cache (#22341)
Maintain float32 type in partitioned group-by (#22340)
Resolve streaming panic on multiple merge_sorted (#22205)
Fix ndjson nested types (#22325)
Fix nested datetypes in ndjson (#22321)
Check matching lengths for pl.corr (#22305)
Move type coercion for pl.duration to planner (#22304)
Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
Coalesce correct column for new streaming full join (#22301)
Don't collect NaN from Parquet Statistics (#22294)
Set revmap for empty AnyValue to Series (#22293)
Add an __all__ entry to internal type definition module (#22254)
Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
More robust str → date conversion when reading from spreadsheet (#22276)
Deprecate using is_in with 2 equal types and mark as elementwise (#22178)
Duplicate key column name in streaming group_by due to CSE (#22280)
Raise ColumnNotFoundError for missing columns in join_where (#22268)
Parquet filters for logical types and operations (#22253)
Ensure floating-point accuracy in hist (#22245)
Check matching key datatypes for new streaming joins (#22247)
Incorrect length BinaryArray/ListBuilder (#22227)

📖 Documentation

Update docs for schema arg in scan_csv to match read_csv (#22357)
Update pl.when documentation (#22345)
Add missing is_business_day to documentation reference (#22338)
Improve interpolation documentation to clarify behavior of null values (#22274)

🛠️ Other improvements

Install pytorch for 3.13 on Windows (#22356)
Make interpolate fix more robust (#22421)
Fix interpolate test (#22417)
Reduce hot table size in debug mode (#22400)
Replace intrinsic with non-intrinsic (#22401)
Make streaming dispatch public (#22347)
Update rustc to 'nightly-2025-04-19' (#22342)
Update mozilla-actions/sccache-action (#22319)
Purge old parquet and scan code (#22226)
Add an __all__ entry to internal type definition module (#22254)
Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
Add Polars Cloud 0.0.7 release notes (#22223)
Change format name from list to implode (#22240)
Make other parallel parquet modes filter afterwards (#22228)
Close async reader issues (#22224)
Add BinaryArrayBuilder (#22225)

Thank you to all our contributors for making this release possible!
@DavideCanton, @JakubValtar, @Jesse-Bakker, @MarcoGorelli, @NeejWeej, @Shoeboxam, @adamreeve, @alexander-beedie, @axellpadilla, @cmdlineluser, @coastalwhite, @d-reynol, @dongchao-1, @florian-klein, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @yiteng-guo

Contributors

orlp, adamreeve, and 21 other contributors

Assets 4

11 Apr 10:26

github-actions

py-1.27.1

319a9a8

Python Polars 1.27.1

✨ Enhancements

Improved expression autocomplete for IPython, Jupyter, and Marimo (#22221)

🐞 Bug fixes

Incorrect condition on empty inner join fast path (#22208)
Fallback predicate filter for min=max with is_in (#22213)
Don't panic for LruCachedFunc for size=0 (#22215)
Writing masked out list values to json (#22210)
Deadlock in streaming distributor (#22207)

Thank you to all our contributors for making this release possible!
@Matt711, @alexander-beedie, @coastalwhite, @dependabot[bot], @orlp, @ritchie46 and dependabot[bot]

Contributors

orlp, alexander-beedie, and 4 other contributors

Assets 4

09 Apr 17:27

github-actions

py-1.27.0

075fe61

Python Polars 1.27.0

💥 Breaking changes

Make bottom interval closed in hist (#22090)
Change Partition API to base_path and file_path (#21888)

🚀 Performance improvements

Add CSE to streaming groupby (#22196)
Speed-up new streaming predicate filtering (#22179)
Speedup new-streaming file row count (#22169)
Fix quadratic behavior when casting Enums (#22008)
Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
Fast path for empty inner join (#21965)
Add native semi/anti join in new streaming engine (#21937)
Cache regex compilation globally (#21929)

✨ Enhancements

Add SPLIT_PART string function to the SQL interface (#22158)
Allow scalar expr in Expr.diff (#22142)
Support additional unsigned int aliases in the SQL interface (#22127)
Add STRING_TO_ARRAY function to the SQL interface (#22129)
Add dt.is_business_day (#21776)
Add an eager parameter to pl.cov (#22098)
Add support for Int128 parsing/recognition to the SQL interface (#22104)
Add an eager parameter to pl.coalesce (#22092)
Add an eager parameter to pl.corr (#22097)
Allow sinking to abstract python io and fs classes (#21987)
Add add_alp_optimize_exprs to IRBuilder (#22061)
Add cat.slice (#21971)
Support growing schema if line lenght increases during csv schema inference (#21979)
Replace thread unsafe GilOnceCell with Mutex (#21927)
Support modified dsl in file cache (#21907)

🐞 Bug fixes

Implode in agg (#22197)
Reduce GIL hold time for IO plugins in new-streaming (#22186)
Enhance predicate validation and cast safety in join_where (#22112)
Handle Parquet with compressed empty DataPage v2 (#22172)
Schema error during lowering (#22175)
Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
Incorrect rounding for very large/small numbers (#22173)
Allow set input to list.set_* operations (#22163)
Deadlock in join due to rayon nested task-stealing (#22159)
Mark Expr.repeat_by as elementwise (#22068)
Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
Raise an error if a number doesn't have associated unit in duration strings (#22035)
Add i128 as supertype to boolean (#22138)
Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
Add broadcasts and error messages for many elementwise operations (#22130)
Throw error for n=0 on list.gather_every (#22122)
Throw error for unsupported rolling operations (#22121)
Error on unequal length str.to_integer arguments (#22100)
Make bottom interval closed in hist (#22090)
Relative path resolution for plugin libraries (#21911)
Avoiding panic with striptime for out-of-bounds dates (#21208)
Join revmaps for categoricals in merge_sorted (#21976)
Fix glob expansion matching extra files (#21991)
Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
Parquet filter performance regression from multiscan dispatch (#22116)
Panic for unequal length ewm_mean_by args (#22093)
Add scalarity checks to pl.repeat (#22088)
Type check n parameter of pl.repeat (#22071)
Mark bitwise_{count,leading,trailing}_{ones,zeros} as elementwise (#22044)
Mark pl.*_ranges functions correctly as element-wise (#22059)
Correctly type check pl.arctan2 (#22060)
Mark pl.business_day_count as elementwise (#22055)
Check input python type for str.extract_groups (#22032)
Check types for fill_char in str.pad_{start,end} (#22036)
Mark str.to_decimal properly as non-elementwise (#22040)
Documented return type for bin.encode and bin.decode (#22022)
Revert #22017 and improve block(_in_place)_on doc comment (#22031)
Remove outdated depth warning (#22030)
Expression pl.concat was incorrectly marked as elementwise (#22019)
Use block_in_place_on to start streaming (#22017)
Panic on empty aggregation in streaming (#22016)
Error instead of panick for invalid durations in dt.offset_by() and dt.round() (#21982)
Raise error instead of silently appending NULL in NDJSON parsing (#21953)
Ensure AV is static before pushing to row buffer (#21967)
Deadlock in new-streaming multiplexer (#21963)
Release GIL in collect_with_callback (#21941)
Panic in new RegexCache (#21935)
Type hint of cs.exclude() is SelectorType instead of Expr (#21892)
Add correct deprecation warning for .str.concat (#21666)
Use absolute paths by defaults for plugins (#21904)

📖 Documentation

Add user guide section on working with Sheets in Colab (#22161)
Update distributed engine docs (#22128)
Add Polars Cloud release notes (#22021)
Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
Fix typo (#21954)
Fix 'pickleable' typo in docs (#21938)
Change ctx to compute=ctx for all remote query examples (#21930)

🛠️ Other improvements

Remove old MultiScanExec for in-memory (#22184)
Separate FunctionOptions from DSL calls (#22133)
Undeprecate backward_fill and forward_fill (#22156)
Handle conversion of Duration specially in pyir (#22101)
Deprecate duplicate backward_fill and forward_fill interface (#22083)
Solve clippy lints for 1.86 (#22102)
Remove rust exclusive MaxBound and MinBound fill strategies (#22063)
Change Partition API to base_path and file_path (#21888)
Fix pydantic model_fields deprecation (#21958)

Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]

Contributors

orlp, wence-, and 26 other contributors

Assets 4

23 Mar 11:54

github-actions

py-1.26.0

ac9d598

Python Polars 1.26.0

🚀 Performance improvements

Use views for binary hash tables and add single-key binary variant (#21872)
Avoid rechunking in gather (#21876)
Switch ahash for foldhash (#21852)
Put THP behind feature flag (#21853)
Enable THP by default (#21829)
Improve join performance for expanding joins (#21821)
Use binary_search instead of contains in business-day functions (#21775)

✨ Enhancements

Add support for io-plugins in new-streaming (#21870)
Add PartitionParted (#21788)
Minor improvements to EXPLAIN plan output (#21822)
Add explain_all (#21797)
Allow to use repeat_by with (nested) lists and structs (#21206)

🐞 Bug fixes

Fix DataFrame.nan_to_null work for tuple (#21861)
Allow pivot on empty frame for all integer index dtypes (#21890)
Null panic on decimal aggregate (#21873)
Join with categoricals on new-streaming engine (#21825)
Fix div 0 partitioned group-by (#21842)
Incorrect quote check in CSV parser (#21826)
Add option to use relative paths for plugin libraries (#21675)
Respect header separator in sink_csv (#21814)
Deprecation of streaming=False (#21813)
Fix collect_all type-coercion (#21810)
Memory leaks in SharedStorage (#21798)
Make None refer to uncompressed in sink_ipc (#21786)

📖 Documentation

Add sources and sinks to user-guide (#21780)

🛠️ Other improvements

Change dynamic literals to be separate category (#21849)
Add POLARS_TIMEOUT_MS for timing out slow Polars tests (#21887)
Disable --dist loadgroup in pytest (#21885)
Fix refcount assert being messed up by pytest assertion magic (#21884)
Add env vars to configure new-streaming buffer sizes (#21818)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @alexander-beedie, @anath2, @borchero, @coastalwhite, @dongchao-cn, @kgv, @mcrumiller, @nameexhaustion, @orlp and @ritchie46

Contributors

orlp, kgv, and 10 other contributors

Assets 4

15 Mar 16:55

github-actions

py-1.25.2

5d7f8a7

Python Polars 1.25.2

🏆 Highlights

Enable common subplan elimination across plans in collect_all (#21747)
Add lazy sinks (#21733)
Add PartitionByKey for new streaming sinks (#21689)
Enable new streaming memory sinks by default (#21589)

🚀 Performance improvements

Implement linear-time rolling_min/max (#21770)
Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
Enable common subplan elimination across plans in collect_all (#21747)
Allow elementwise functions in recursive lowering (#21653)
Add primitive single-key hashtable to new-streaming join (#21712)
Remove unnecessary black_boxes in Kahan summation (#21679)
Box large enum variants (#21657)
Improve join performance for new-streaming engine (#21620)
Pre-fill caches (#21646)
Optimize only a single cache input (#21644)
Collect parquet statistics in one contiguous buffer (#21632)
Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
Don't maintain order when maintain_order=False in new streaming sinks (#21586)
Pre-sort groups in group-by-dynamic (#21569)

✨ Enhancements

Add support for rolling_(sum/min/max) for booleans through casting (#21748)
Support multi-column sort for all nested types and nested search-sorted (#21743)
Add lazy sinks (#21733)
Add PartitionByKey for new streaming sinks (#21689)
Fix replace flags (#21731)
Add mkdir flag to sinks (#21717)
Enable joins on list/array dtypes (#21687)
Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
Support all elementwise functions in IO plugin predicates (#21705)
Stabilize Enum datatype (#21686)
Support Polars int128 in from arrow (#21688)
Use FFI to read dataframe instead of transmute (#21673)
Enable new streaming memory sinks by default (#21589)
Cloud support for new-streaming scans and sinks (#21621)
Add len method to arr (#21618)
Closeable files on unix (#21588)
Add new PartitionMaxSize sink (#21573)
Support engine callback for LazyFrame.profile (#21534)
Dispatch new-streaming CSV negative slice to separate node (#21579)
Add NDJSON source to new streaming engine (#21562)
Support passing token in storage_options for GCP cloud (#21560)

🐞 Bug fixes

Expose and document partitions (#21765)
Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
Fix error due to race condition in file cache (#21753)
Clear NaNs due to zero-weight division in rolling var/std (#21761)
Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
Disallow cast from boolean to categorical/enum (#21714)
Don't check sortedness in join_asof when 'by' groups supplied, but issue warning (#21724)
Incorrect multithread path taken for aggregations (#21727)
Disallow cast to empty Enum (#21715)
Fix list.mean and list.median returning Float64 for temporal types (#21144)
Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
Always fallback in SkipBatchPredicate (#21711)
New streaming multiscan deadlock (#21694)
Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
IO plugin; support empty iterator (#21704)
Support nulls in multi-column sort (#21702)
Window function check length of groups state (#21697)
Support 128 sum reduction on new streaming (#21691)
IPC round-trip of list of empty view with non-empty bufferset (#21671)
Variance can never be negative (#21678)
Incorrect loop length in new-streaming group by (#21670)
Right join on multiple columns not coalescing left_on columns (#21669)
Casting Struct to String panics if n_chunks > 1 (#21656)
FixFuture attached to different loop error on read_database_uri (#21641)
Fix deadlock in cache + hconcat (#21640)
Properly handle phase transitions in row-wise sinks (#21600)
Enable new streaming memory sinks by default (#21589)
Always use global registry for object (#21622)
Check enum categories when reading csv (#21619)
Unspecialized prefiltering on nullable arrays (#21611)
Release the gil on explain (#21607)
Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
Bad null handling in unordered row encoding (#21603)
Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
Bad view index in BinaryViewBuilder (#21590)
Fix CSV count with comment prefix skipped empty lines (#21577)
New streaming IPC enum scan (#21570)
Several aspects related to ParquetColumnExpr (#21563)
Don't hit parquet::pre-filtered in case of pre-slice (#21565)

📖 Documentation

Add skrub to ecosystem.md (#21760)
Add example for percentile rank (#21746)
Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
Add expression composability to PySpark comparison (#21473)
Document read_().lazy() antipattern (#21623)
Update Polars Cloud interactive workflow examples (#21609)
Add a Plotnine example to the visualization docs (#21597)
Add cloud api reference to Ref guide (#21566)

🛠️ Other improvements

Remove variance numerical stability hack (#21749)
Only use chrono_tz timezones in hypothesis testing (#21721)
Remove order check from flaky test (#21730)
Add sinks into the DSL before optimization (#21713)
Add missing test case for #21701 (#21709)
Remove old-streaming from engine argument (#21667)
Add as_phys_any to PrivateSeries for downcasting (#21696)
Use FFI to read dataframe instead of transmute (#21673)
Work around typos ignore bug (#21672)
Added Test For datetime_range Nanosecond Overflow (#21354)
Update to edition 2024 (#21662)
Update rustc (#21647)
Support object from chunks (#21636)
Push versioned docs on workflow dispatch (#21630)
Fail docs early (#21629)
Check major/minor in docs (#21626)
Add docs workflow (#21624)
Add test for 21581 (#21617)
Remove even more parquet multiscan handling (#21601)
Remove multiscan handling from new streaming parquet source (#21584)
Prepare skeleton for partitioning sinks (#21536)

Thank you to all our contributors for making this release possible!
@GaelVaroquaux, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @alexander-beedie, @coastalwhite, @dependabot[bot], @jrycw, @kdn36, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46, @wence- and dependabot[bot]

Contributors

orlp, GaelVaroquaux, and 15 other contributors

Assets 4

02 Mar 20:33

github-actions

py-1.24.0

6e6c97e

Python Polars 1.24.0

🚀 Performance improvements

Provide a fallback skip batch predicate for constant batches (#21477)
Parallelize the passing in new streaming multiscan (#21430)

✨ Enhancements

Add lossy decoding to read_csv for non-utf8 encodings (#21433)
Add DataFrame.write_iceberg (#15018)
Add 'nulls_equal' parameter to is_in (#21426)
Improve numeric stability rolling_{std, var, cov, corr} (#21528)
IR Serde cross-filter (#21488)
Give priority to pycapsule interface in from_dataframe (#21377)
Support writing Time type in json (#21454)
Activate all optimizations in sinks (#21462)
Add AssertionError variant to PolarsError in polars-error (#21460)
Pass filter to inner readers in multiscan new streaming (#21436)

🐞 Bug fixes

Categorical min/max panicking when string cache is enabled (#21552)
Don't encode IPC record batch twice (#21525)
Respect rewriting flag in Node rewriter (#21516)
Correct skip batch predicate for partial statistics (#21502)
Make the Parquet Sink properly phase aware (#21499)
Don't divide by zero in partitioned group-by (#21498)
Create new linearizer between rowwise new streaming sink phases (#21490)
Don't drop rows in sinks between new streaming phases (#21489)
Incorrect lazy schema for Expr.list.diff (#21484)
Give priority to pycapsule interface in from_dataframe (#21377)
Duration Series arithmetic operations (#21425)
Fix unwrap None panic when filtering delta with missing columns (#21453)
Use stable sort for rolling-groupby (#21444)
Throw exception if dataframe is too large to be compatible with Excel (#20900)
Address regression with read_excel not handling URL paths correctly (#21428)

📖 Documentation

Fix typo (#21554)
Correct typos and grammar in Python docstrings (#21524)
Move llm page under misc (#21550)
Polars Cloud docs (#21548)
Add LazyFrame.remote docs entry (#21529)
Specify that the key column must be sorted in ascending order in merge_sorted (#21501)
Add Polars & LLMs page to the user guide (#21218)
Mention that statistics=True doesn't enable all statistics in sink_parquet() (#21434)

🛠️ Other improvements

Don't take ownership of IRplan in new streaming engine (#21551)
Refactor code for re-use by streaming NDJSON source (#21520)
Simplify the phase handling of new streaming sinks (#21530)
Improve IPC sink node parallelism (#21505)
Use tikv-jemallocator (#21486)
Rename 'join_nulls' parameter to 'nulls_equal' in join functions (#21507)
Move rolling to polars-compute (#21503)
Remove Growable in favor of ArrayBuilder (#21500)
Introduce a Sink Node trait in the new streaming engine (#21458)
Add test for rolling stability sort (#21456)
Add test for empty .is_in predicate filter (#21455)
Test for unique length on multiple columns (#21418)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @banflam, @braaannigan, @coastalwhite, @dependabot[bot], @etiennebacher, @ghuls, @kevinjqliu, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stijnherfst, @thomasjpfan and dependabot[bot]

Contributors

orlp, ghuls, and 16 other contributors

Assets 4

Releases: pola-rs/polars

Python Polars 1.30.0-beta.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Rust Polars 0.47.1

🏆 Highlights

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

Contributors

Uh oh!

Python Polars 1.29.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.28.1

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.28.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.27.1

✨ Enhancements

🐞 Bug fixes

Contributors

Uh oh!

Python Polars 1.27.0

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.26.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.25.2

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.24.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!