Releases: pola-rs/polars
Python Polars 1.30.0-beta.1
🚀 Performance improvements
- Increase default cross-file parallelism limit for new-streaming multiscan (#22700)
- Add elementwise execution mode for
list.eval(#22715) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Add streaming cross-join node (#22581)
- Switch off
maintain_orderin group-by followed by sort (#22492)
✨ Enhancements
- Support binaryoffset in search sorted (#22786)
- Add
nulls_equalflag tolist/arr.contains(#22773) - Implement
LazyFrame.match_to_schema(#22726) - Improved time-string parsing and inference (generally, and via the SQL interface) (#22606)
- Allow for
.overto be called withoutpartition_by(#22712) - Support
AnyValuetranslation fromPyMappingvalues (#22722) - Support optimised init from non-dict
Mappingobjects infrom_recordsand frame/series constructors (#22638) - Support inference of
Int128dtype from databases that support it (#22682) - Add options to write Parquet field metadata (#22652)
- Add
cast_optionsparameter to control type casting inscan_parquet(#22617) - Allow casting List<UInt8> to Binary (#22611)
- Allow setting of regex size limit using
POLARS_REGEX_SIZE_LIMIT(#22651) - Support use of literal values as "other" when evaluating
Series.zip_with(#22632) - Allow to read and write custom file-level parquet metadata (#21806)
- Support PEP702
@deprecateddecorator behaviour (#22594) - Support grouping by
pl.Array(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
🐞 Bug fixes
- Respect BinaryOffset metadata (#22785)
- Correct the output order of
PartitionByKeyandPartitionParted(#22778) - Fallback to non-strict casting for deprecated casts (#22760)
- Clippy on new stable version (#22771)
- Handle sliced out remainder for bitmaps (#22759)
- Don't merge
Enumcategories on append (#22765) - Fix unnest() not working on empty struct columns (#22391)
- Fix the default value type in
Schemainit (#22589) - Correct name in
unnesterror message (#22740) - Provide "schema" to
DataFrame, even if empty JSON (#22739) - Properly account for nulls in the
is_not_nancheck made indrop_nans(#22707) - Incorrect result from SQL
count(*)withpartition by(#22728) - Fix deadlock joining scanned tables with low thread count (#22672)
- Don't allow deserializing incompatible DSL (#22644)
- Incorrect null dtype from binary ops in empty group_by (#22721)
- Don't mark
str.replace_manywith Mapping as deprecated (#22697) - Gzip has maximum compression of 9, not 10 (#22685)
- Fix predicate pushdown of fallible expressions (#22669)
- Fix
index out of boundspanic when scanning hugging face (#22661) - Panic on
group_bywith literal and empty rows (#22621) - Return input instead of panicking if empty subset in
drop_nulls()anddrop_nans()(#22469) - Bump argminmax to 0.6.3 (#22649)
- DSL version deserialization endianness (#22642)
- Allow Expr.round() to be called on integer dtypes (#22622)
- Fix panic when filtering based on row index column in parquet (#22616)
- WASM and PyOdide compile (#22613)
- Resolve
get()SchemaMismatch panic (#22350) - Panic in group_by_dynamic on single-row df with group_by (#22597)
- Add
new_streamingfeature topolarscrate (#22601) - Consistently use Unix epoch as origin for
dt.truncate(except weekly buckets which start on Mondays) (#22592) - Fix interpolate on dtype Decimal (#22541)
- CSV count rows skipped last line if file did not end with newline (#22577)
- Make nested strict casting actually strict (#22497)
- Make
replaceandreplace_strictmapping use list literals (#22566) - Allow pivot on
Timecolumn (#22550) - Fix error when providing CSV schema with extra columns (#22544)
- Panic on bitwise op between Series and Expr (#22527)
- Multi-selector regex expansion (#22542)
📖 Documentation
- Fix broken link to service account page in Polars Cloud docs (#22762)
- Add
match_to_schemato API reference (#22777) - Provide additional explanation and examples for the
value_counts"normalize" parameter (#22756) - Rework documentation for
drop/fillfor nulls/nans (#22657) - Add documentation to new
RoundModeparameter inround(#22555) - Add missing
repeat_byto API reference, fixuplist.get(#22698) - Fix non-rendering bullet points in
scan_iceberg(#22694) - Improve
insert_columndocstring (description and examples) (#22551) - Improve
joindocumentation (#22556)
🛠️ Other improvements
- Update cloud docs (#22624)
- Fix unstable
list.evalperformance test (#22729) - Add proptest implementations for all Array types (#22711)
- Dispatch
.write_*to.lazy().sink_*(engine='in-memory')(#22582) - Move to all optimization flags to
QueryOptFlags(#22680) - Add test for
str.replace_many(#22615) - Stabilize
sink_*(#22643) - Add proptest for row-encode (#22626)
- Update rust version in nix flake (#22627)
- Add a nix flake with a devShell and package (#22246)
- Use a wrapper struct to store time zone (#22523)
- Add
proptesttesting for for parquet decoding kernels (#22608) - Include equiprobable as valid quantile method (#22571)
- Remove confusing error context calling
.collect(_eager=True)(#22602) - Fix test_truncate_path test case (#22598)
- Unify function flags into 1 bitset (#22573)
- Display the operation behind
in-memory-map(#22552)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Julian-J-S, @MarcoGorelli, @WH-2099, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @etiennebacher, @florian-klein, @itamarst, @kdn36, @mcrumiller, @nameexhaustion, @nikaltipar, @orlp, @pavelzw, @r-brink, @ritchie46, @stijnherfst, @teotwaki, @timkpaine and @wence-
Rust Polars 0.47.1
🏆 Highlights
- Enable common subplan elimination across plans in
collect_all(#21747) - Add lazy sinks (#21733)
- Add
PartitionByKeyfor new streaming sinks (#21689) - Enable new streaming memory sinks by default (#21589)
💥 Breaking changes
- Make bottom interval closed in
hist(#22090)
🚀 Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skewkernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefilteredby default for new streaming (#22190) - Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
- Use views for binary hash tables and add single-key binary variant (#21872)
- Avoid rechunking in gather (#21876)
- Switch ahash for foldhash (#21852)
- Put THP behind feature flag (#21853)
- Enable THP by default (#21829)
- Improve join performance for expanding joins (#21821)
- Use binary_search instead of contains in business-day functions (#21775)
- Implement linear-time rolling_min/max (#21770)
- Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
- Enable common subplan elimination across plans in
collect_all(#21747) - Allow elementwise functions in recursive lowering (#21653)
- Add primitive single-key hashtable to new-streaming join (#21712)
- Remove unnecessary black_boxes in Kahan summation (#21679)
- Box large enum variants (#21657)
- Improve join performance for new-streaming engine (#21620)
- Pre-fill caches (#21646)
- Optimize only a single cache input (#21644)
- Collect parquet statistics in one contiguous buffer (#21632)
- Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
- Don't maintain order when maintain_order=False in new streaming sinks (#21586)
- Pre-sort groups in group-by-dynamic (#21569)
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
- Toggle projection pushdown for eager rolling (#21405)
- Fix pathologic
rolling + group-byperformance and memory explosion (#21403) - Add sampling to new-streaming equi join to decide between build/probe side (#21197)
- Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
- Implement native Expr.count() on new-streaming (#21126)
- Speed up list operations that use amortized_iter() (#20964)
- Use Cow as output for rechunk and add rechunk_mut (#21116)
- Reduce arrow slice mmap overhead (#21113)
- Reduce conversion cost in chunked string gather (#21112)
- Enable prefiltered by default for new streaming (#21109)
- Enable parquet column expressions for streaming (#21101)
- Deduplicate buffers again in stringview concat kernel (#21098)
- Add dedicated concatenate kernels (#21080)
- Rechunk only once during join probe gather (#21072)
- Speed up from_pandas when converting frame with multi-index columns (#21063)
- Change default memory prefetch to MADV_WILLNEED (#21056)
- Remove cast to boolean after comparison in optimizer (#21022)
- Split last rowgroup among all threads in new-streaming parquet reader (#21027)
- Recombine into larger morsels in new-streaming join (#21008)
- Improve
list.minandlist.maxperformance for logical types (#20972) - Ensure count query select minimal columns (#20923)
✨ Enhancements
- Support grouping by
pl.Array(#22575) - Preserve exception type and traceback for errors raised from Python (#22561)
- Use fixed-width font in streaming phys plan graph (#22540)
- Highlight nodes in streaming phys plan graph (#22535)
- Support BinaryOffset serde (#22528)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
INandNOT INexpressions (#22487) - Add more IRBuilder utils (#22482)
- Support
DataFrameandSeriesinit from torchTensorobjects (#22177) - Add
RoundModefor Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
- Make streaming dispatch public (#22347)
- Add
rolling_kurtosis(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support
implode + agg(#22230) - Dispatch scans to new-streaming by default (#22153)
- Improved expression autocomplete for
IPython,Jupyter, andMarimo(#22221) - Expose
FunctionIR::FastCountin the python visitor (#22195) - Add
SPLIT_PARTstring function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAYfunction to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add support for
Int128parsing/recognition to the SQL interface (#22104) - Allow sinking to abstract python
ioandfsclasses (#21987) - Add
add_alp_optimize_exprstoIRBuilder(#22061) - Add
cat.slice(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCellwithMutex(#21927) - Support modified dsl in file cache (#21907)
- Add support for io-plugins in new-streaming (#21870)
- Add
PartitionParted(#21788) - Add DoubleEndedIterator for CatIter (#21816)
- Minor improvements to EXPLAIN plan output (#21822)
- Add
polars_testingfolder with relevant files andadd_series_equal!()functionality (#21722) - Allow to use
repeat_bywith (nested) lists and structs (#21206) - Add support for rolling_(sum/min/max) for booleans through casting (#21748)
- Support multi-column sort for all nested types and nested search-sorted (#21743)
- Add lazy sinks (#21733)
- Add
PartitionByKeyfor new streaming sinks (#21689) - Fix replace flags (#21731)
- Add
mkdirflag to sinks (#21717) - Enable joins on list/array dtypes (#21687)
- Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
- Support all elementwise functions in IO plugin predicates (#21705)
- Stabilize Enum datatype (#21686)
- Support Polars int128 in from arrow (#21688)
- Use FFI to read dataframe instead of transmute (#21673)
- Enable new streaming memory sinks by default (#21589)
- Cloud support for new-streaming scans and sinks (#21621)
- Add len method to arr (#21618)
- Closeable files on unix (#21588)
- Add new
PartitionMaxSizesink (#21573) - Implement
unpack_dtypes()functionality with unit tests (#21574) - Support engine callback for
LazyFrame.profile(#21534) - Dispatch new-streaming CSV negative slice to separate node (#21579)
- Add NDJSON source to new streaming engine (#21562)
- Add lossy decoding to
read_csvfor non-utf8 encodings (#21433) - Add 'nulls_equal' parameter to
is_in(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}(#21528) - IR Serde cross-filter (#21488)
- Support writing
Timetype in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionErrorvariant toPolarsErrorinpolars-error(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
- Implement i128 -> str cast (#21411)
- Version DSL (#21383)
- Make user facing binary formats mostly self describing (#21380)
- Filter hive files using predicates in new streaming (#21372)
- Add negative slicing to new streaming multiscan (#21219)
- Pub-licize Expr DSL Function enums (#20421)
- Implement sorted flags for struct series (#21290)
- Support reading arrow Map type from Delta (#21330)
- Add a dedicated
removemethod forDataFrameandLazyFrame(#21259) - Expose
include_file_pathsto python visitor (#21279) - Implement
merge_sortedfor struct (#21205) - Add positive slice for new streaming MultiScan (#21191)
- Don't take in rewriting visitor (#21212)
- Add SQL support for the
DELETEstatement (#21190) - Add row index to new streaming multiscan (#21169)
- Improve DataFrame fmt in explain (#21158)
- Add projection pushdown to new streaming multiscan (#21139)
- Implement join on struct dtype (#21093)
- Use unique temporary directory path per user and restrict permissions (#21125)
- Enable new streaming multiscan for CSV (#21124)
- Environment
POLARS_MAX_CONCURRENT_SCANSin multiscan for new streaming (#21127) - Multi/Hive scans in new streaming engine (#21011)
- Add
linear_spaces(#20941) - Implement
merge_sortedfor binary (#21045) - Hold string cache in new streaming engine and fix row-encoding (#21039)
- Support max/min method for Time dtype (#19815)
- Implement a streaming merge sorted node (#20960)
- Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
- Add negative slice support to new-streaming engine (#21001)
- Allow for more RG skipping by rewriting expr in planner (#20828)
- Rename catalog
schematonamespace(#20993) - Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
- Improved support for KeyboardInterrupts (#20961...
Python Polars 1.29.0
🚀 Performance improvements
- Avoid alloc_zeroed in decompression (#22460)
✨ Enhancements
- Highlight nodes in streaming phys plan graph (#22535)
- Show physical stage graph (#22491)
- Add structure for dispatching iceberg to native scans (#22405)
- Add SQL support for checking array values with
INandNOT INexpressions (#22487) - Support
DataFrameandSeriesinit from torchTensorobjects (#22177) - Add
RoundModefor Decimal and Float (#22248) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
🐞 Bug fixes
- Streaming outer join coalesce bug (#22530)
- Remove redundant print statement in
assert_frame_schema_equal()(#22529) - Bug in
.unique()followed by.slice()(#22471) - Fix error reading parquet with datetimes written by pandas (#22524)
- Fix
schema_overridesnot taking effect in NDJSON (#22521) - Fold flags and verify scalar correctness in apply (#22519)
- Invalid values were triggering panics instead of returning
nullindt.to_date/dt.to_datetime(#22500) - Ensure numpy
isinstancecheck is lazy (avoid forcing the dependency) (#22486) - Incorrectly dropped sort after unique for some queries (#22489)
- Fix incorrect ternary agg state with mixed columns and scalars (#22496)
- Make
replaceandreplace_strictproperly elementwise (#22465) - Fix index out of bounds panic on parquet prefiltering (#22458)
- Integer underflow when checking parquet UTF-8 (#22472)
- Add implementation for
array.getwith idx overflow (#22449) - Deprecate
str.collection functions with flat strings and mark as elementwise (#22461) - Deprecate flat
list.gatherand mark as elementwise (#22456) - Inform users that IO error path file name can be expanded with POLARS_VERBOSE=1 (#22427)
📖 Documentation
- Fix typo in structs page (#22504)
🛠️ Other improvements
- Don't store name/dtype in grouper (#22525)
- Add structure for dispatching iceberg to native scans (#22405)
- Remove unused reduction code (#22462)
- Pin to explicit macOS version in code coverage (#22432)
Thank you to all our contributors for making this release possible!
@AH-Merii, @JakubValtar, @Julian-J-S, @Kevin-Patyk, @Liyixin95, @MarcoGorelli, @Matt711, @alexander-beedie, @brianmakesthings, @coastalwhite, @nameexhaustion, @orlp and @ritchie46
Python Polars 1.28.1
🐞 Bug fixes
- Reading of reencoded categorical in Parquet (#22436)
- Last thread in parquet predicate filter oob (#22429)
📖 Documentation
📦 Build system
- Update
pyo3andnumpycrates to version0.24(#22015)
🛠️ Other improvements
- Add test for
implode+over(#22437) - Fix CI by removing use_legacy_dataset (#22438)
- Only use pytorch index-url for
pytorchpackage (#22355)
Thank you to all our contributors for making this release possible!
@bschoenmaeckers, @coastalwhite, @etiennebacher, @mcrumiller and @ritchie46
Python Polars 1.28.0
🚀 Performance improvements
- Lower Expr.(n_)unique to group_by on streaming engine (#22420)
- Chunk huge munmap calls (#22414)
- Add single-key variants of streaming group_by (#22409)
- Improve accumulate_dataframes_vertical performance (#22399)
- Use optimize rolling_quantile with varying window sizes (#22353)
- Dedicated
rolling_skewkernel (#22333) - Call large munmap's in background thread (#22329)
- New streaming group_by implementation (#22285)
- Patch jemalloc to not purge huge allocs eagerly if we have background threads (#22318)
- Turn on
parallel=prefilteredby default for new streaming (#22190)
✨ Enhancements
- When reporting unexpected types in errors, module-qualify the typename (#22390)
- Add Series
backward_fill/forward_fill(#22360) - Add GPU support to sink_* APIs (#20940)
- Changed mapping type from
dicttoMapping(#19400) (#19436) - Make streaming dispatch public (#22347)
- Add
rolling_kurtosis(#22335) - Support Cast in IO plugin predicates (#22317)
- Add
.sort(nulls_last=True)to booleans, categoricals and enums (#22300) - Add rolling min/max for temporals (#22271)
- Support literal:list agg (#22249)
- Support running Polars SQL queries against any objects implementing the PyCapsule interface (#22235)
- Support
implode + agg(#22230) - Dispatch scans to new-streaming by default (#22153)
🐞 Bug fixes
- Ensure
write_excelcorrectly preserves null values in nested dtype data on export (#22379) - Panic when visualizing streaming physical plan with joins (#22404)
- Fix incorrect filter after
LazyFrame.rename().select()(#22380) - Fix
select(len())performance regression (#22363) - Handle pytz named timezone in
lit(#21785) - Don't leak state during prefill CSE cache (#22341)
- Maintain float32 type in partitioned group-by (#22340)
- Resolve streaming panic on multiple
merge_sorted(#22205) - Fix ndjson nested types (#22325)
- Fix nested datetypes in ndjson (#22321)
- Check matching lengths for
pl.corr(#22305) - Move type coercion for
pl.durationto planner (#22304) - Check dtype to avoid panic with mixed types in min/max_horizontal (#21857)
- Coalesce correct column for new streaming full join (#22301)
- Don't collect
NaNfrom Parquet Statistics (#22294) - Set revmap for empty
AnyValuetoSeries(#22293) - Add an
__all__entry to internal type definition module (#22254) - Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 (#22180)
- More robust
str → dateconversion when reading from spreadsheet (#22276) - Deprecate using
is_inwith 2 equal types and mark as elementwise (#22178) - Duplicate key column name in streaming group_by due to CSE (#22280)
- Raise
ColumnNotFoundErrorfor missing columns injoin_where(#22268) - Parquet filters for logical types and operations (#22253)
- Ensure floating-point accuracy in
hist(#22245) - Check matching key datatypes for new streaming joins (#22247)
- Incorrect length BinaryArray/ListBuilder (#22227)
📖 Documentation
- Update docs for schema arg in scan_csv to match read_csv (#22357)
- Update
pl.whendocumentation (#22345) - Add missing
is_business_dayto documentation reference (#22338) - Improve interpolation documentation to clarify behavior of null values (#22274)
🛠️ Other improvements
- Install pytorch for 3.13 on Windows (#22356)
- Make interpolate fix more robust (#22421)
- Fix interpolate test (#22417)
- Reduce hot table size in debug mode (#22400)
- Replace intrinsic with non-intrinsic (#22401)
- Make streaming dispatch public (#22347)
- Update rustc to 'nightly-2025-04-19' (#22342)
- Update mozilla-actions/sccache-action (#22319)
- Purge old parquet and scan code (#22226)
- Add an
__all__entry to internal type definition module (#22254) - Add online skew/kurtosis algorithm for future use in rolling kernels (#22261)
- Add Polars Cloud 0.0.7 release notes (#22223)
- Change format name from list to implode (#22240)
- Make other parallel parquet modes filter afterwards (#22228)
- Close async reader issues (#22224)
- Add BinaryArrayBuilder (#22225)
Thank you to all our contributors for making this release possible!
@DavideCanton, @JakubValtar, @Jesse-Bakker, @MarcoGorelli, @NeejWeej, @Shoeboxam, @adamreeve, @alexander-beedie, @axellpadilla, @cmdlineluser, @coastalwhite, @d-reynol, @dongchao-1, @florian-klein, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @yiteng-guo
Python Polars 1.27.1
✨ Enhancements
- Improved expression autocomplete for
IPython,Jupyter, andMarimo(#22221)
🐞 Bug fixes
- Incorrect condition on empty inner join fast path (#22208)
- Fallback predicate filter for
min=maxwithis_in(#22213) - Don't panic for
LruCachedFuncforsize=0(#22215) - Writing masked out list values to json (#22210)
- Deadlock in streaming distributor (#22207)
Thank you to all our contributors for making this release possible!
@Matt711, @alexander-beedie, @coastalwhite, @dependabot[bot], @orlp, @ritchie46 and dependabot[bot]
Python Polars 1.27.0
💥 Breaking changes
- Make bottom interval closed in
hist(#22090) - Change Partition API to
base_pathandfile_path(#21888)
🚀 Performance improvements
- Add CSE to streaming groupby (#22196)
- Speed-up new streaming predicate filtering (#22179)
- Speedup new-streaming file row count (#22169)
- Fix quadratic behavior when casting Enums (#22008)
- Lower is_in to bitmap-output semi-join in new streaming engine (#21948)
- Fast path for empty inner join (#21965)
- Add native semi/anti join in new streaming engine (#21937)
- Cache regex compilation globally (#21929)
✨ Enhancements
- Add
SPLIT_PARTstring function to the SQL interface (#22158) - Allow scalar expr in
Expr.diff(#22142) - Support additional unsigned int aliases in the SQL interface (#22127)
- Add
STRING_TO_ARRAYfunction to the SQL interface (#22129) - Add dt.is_business_day (#21776)
- Add an
eagerparameter topl.cov(#22098) - Add support for
Int128parsing/recognition to the SQL interface (#22104) - Add an
eagerparameter topl.coalesce(#22092) - Add an
eagerparameter topl.corr(#22097) - Allow sinking to abstract python
ioandfsclasses (#21987) - Add
add_alp_optimize_exprstoIRBuilder(#22061) - Add
cat.slice(#21971) - Support growing schema if line lenght increases during csv schema inference (#21979)
- Replace thread unsafe
GilOnceCellwithMutex(#21927) - Support modified dsl in file cache (#21907)
🐞 Bug fixes
- Implode in agg (#22197)
- Reduce GIL hold time for IO plugins in new-streaming (#22186)
- Enhance predicate validation and cast safety in
join_where(#22112) - Handle Parquet with compressed empty DataPage v2 (#22172)
- Schema error during lowering (#22175)
- Rewrite unroll of overlapping groups to mitigate out of range index panic (#22072)
- Incorrect rounding for very large/small numbers (#22173)
- Allow set input to
list.set_*operations (#22163) - Deadlock in join due to rayon nested task-stealing (#22159)
- Mark
Expr.repeat_byas elementwise (#22068) - Fix csv serializer panic by supporting ScalarColumn in as_single_chunk (#22146)
- Raise an error if a number doesn't have associated unit in duration strings (#22035)
- Add
i128as supertype to boolean (#22138) - Fix panic when constructing DF from pyarrow due to duplicate field names (#22114)
- Add broadcasts and error messages for many elementwise operations (#22130)
- Throw error for
n=0onlist.gather_every(#22122) - Throw error for unsupported rolling operations (#22121)
- Error on unequal length
str.to_integerarguments (#22100) - Make bottom interval closed in
hist(#22090) - Relative path resolution for plugin libraries (#21911)
- Avoiding panic with striptime for out-of-bounds dates (#21208)
- Join revmaps for categoricals in
merge_sorted(#21976) - Fix glob expansion matching extra files (#21991)
- Ensure SQL dot-notation for nested column fields resolves correctly (#22109)
- Parquet filter performance regression from multiscan dispatch (#22116)
- Panic for unequal length
ewm_mean_byargs (#22093) - Add scalarity checks to
pl.repeat(#22088) - Type check
nparameter ofpl.repeat(#22071) - Mark
bitwise_{count,leading,trailing}_{ones,zeros}as elementwise (#22044) - Mark
pl.*_rangesfunctions correctly as element-wise (#22059) - Correctly type check
pl.arctan2(#22060) - Mark
pl.business_day_countas elementwise (#22055) - Check input python type for
str.extract_groups(#22032) - Check types for
fill_charinstr.pad_{start,end}(#22036) - Mark
str.to_decimalproperly as non-elementwise (#22040) - Documented return type for
bin.encodeandbin.decode(#22022) - Revert #22017 and improve block(_in_place)_on doc comment (#22031)
- Remove outdated depth warning (#22030)
- Expression pl.concat was incorrectly marked as elementwise (#22019)
- Use block_in_place_on to start streaming (#22017)
- Panic on empty aggregation in streaming (#22016)
- Error instead of panick for invalid durations in
dt.offset_by()anddt.round()(#21982) - Raise error instead of silently appending NULL in NDJSON parsing (#21953)
- Ensure AV is static before pushing to row buffer (#21967)
- Deadlock in new-streaming multiplexer (#21963)
- Release GIL in
collect_with_callback(#21941) - Panic in new RegexCache (#21935)
- Type hint of
cs.exclude()isSelectorTypeinstead ofExpr(#21892) - Add correct deprecation warning for .str.concat (#21666)
- Use absolute paths by defaults for plugins (#21904)
📖 Documentation
- Add user guide section on working with Sheets in Colab (#22161)
- Update distributed engine docs (#22128)
- Add Polars Cloud release notes (#22021)
- Remove trailing space in settings POLARS_CLOUD_CLIENT_ID (#21995)
- Fix typo (#21954)
- Fix 'pickleable' typo in docs (#21938)
- Change ctx to compute=ctx for all remote query examples (#21930)
🛠️ Other improvements
- Remove old
MultiScanExecfor in-memory (#22184) - Separate
FunctionOptionsfrom DSL calls (#22133) - Undeprecate
backward_fillandforward_fill(#22156) - Handle conversion of Duration specially in pyir (#22101)
- Deprecate duplicate
backward_fillandforward_fillinterface (#22083) - Solve clippy lints for 1.86 (#22102)
- Remove rust exclusive
MaxBoundandMinBoundfill strategies (#22063) - Change Partition API to
base_pathandfile_path(#21888) - Fix pydantic model_fields deprecation (#21958)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @EnricoMi, @Jacob640, @JakubValtar, @MarcoGorelli, @MaxJackson, @alexander-beedie, @amotzop, @anath2, @bschoenmaeckers, @cnpryer, @coastalwhite, @dependabot[bot], @eitsupi, @etiennebacher, @hemanth94, @kdn36, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @rgertenbach, @ritchie46, @sebasv, @silannisik, @stijnherfst, @wence-, @zachlefevre and dependabot[bot]
Python Polars 1.26.0
🚀 Performance improvements
- Use views for binary hash tables and add single-key binary variant (#21872)
- Avoid rechunking in gather (#21876)
- Switch ahash for foldhash (#21852)
- Put THP behind feature flag (#21853)
- Enable THP by default (#21829)
- Improve join performance for expanding joins (#21821)
- Use binary_search instead of contains in business-day functions (#21775)
✨ Enhancements
- Add support for io-plugins in new-streaming (#21870)
- Add
PartitionParted(#21788) - Minor improvements to EXPLAIN plan output (#21822)
- Add
explain_all(#21797) - Allow to use
repeat_bywith (nested) lists and structs (#21206)
🐞 Bug fixes
- Fix
DataFrame.nan_to_nullwork for tuple (#21861) - Allow
pivoton empty frame for all integer index dtypes (#21890) - Null panic on decimal aggregate (#21873)
- Join with categoricals on new-streaming engine (#21825)
- Fix div 0 partitioned group-by (#21842)
- Incorrect quote check in CSV parser (#21826)
- Add option to use relative paths for plugin libraries (#21675)
- Respect header separator in
sink_csv(#21814) - Deprecation of
streaming=False(#21813) - Fix collect_all type-coercion (#21810)
- Memory leaks in SharedStorage (#21798)
- Make
Nonerefer touncompressedinsink_ipc(#21786)
📖 Documentation
- Add sources and sinks to user-guide (#21780)
🛠️ Other improvements
- Change dynamic literals to be separate category (#21849)
- Add POLARS_TIMEOUT_MS for timing out slow Polars tests (#21887)
- Disable --dist loadgroup in pytest (#21885)
- Fix refcount assert being messed up by pytest assertion magic (#21884)
- Add env vars to configure new-streaming buffer sizes (#21818)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @alexander-beedie, @anath2, @borchero, @coastalwhite, @dongchao-cn, @kgv, @mcrumiller, @nameexhaustion, @orlp and @ritchie46
Python Polars 1.25.2
🏆 Highlights
- Enable common subplan elimination across plans in
collect_all(#21747) - Add lazy sinks (#21733)
- Add
PartitionByKeyfor new streaming sinks (#21689) - Enable new streaming memory sinks by default (#21589)
🚀 Performance improvements
- Implement linear-time rolling_min/max (#21770)
- Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
- Enable common subplan elimination across plans in
collect_all(#21747) - Allow elementwise functions in recursive lowering (#21653)
- Add primitive single-key hashtable to new-streaming join (#21712)
- Remove unnecessary black_boxes in Kahan summation (#21679)
- Box large enum variants (#21657)
- Improve join performance for new-streaming engine (#21620)
- Pre-fill caches (#21646)
- Optimize only a single cache input (#21644)
- Collect parquet statistics in one contiguous buffer (#21632)
- Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
- Don't maintain order when maintain_order=False in new streaming sinks (#21586)
- Pre-sort groups in group-by-dynamic (#21569)
✨ Enhancements
- Add support for rolling_(sum/min/max) for booleans through casting (#21748)
- Support multi-column sort for all nested types and nested search-sorted (#21743)
- Add lazy sinks (#21733)
- Add
PartitionByKeyfor new streaming sinks (#21689) - Fix replace flags (#21731)
- Add
mkdirflag to sinks (#21717) - Enable joins on list/array dtypes (#21687)
- Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
- Support all elementwise functions in IO plugin predicates (#21705)
- Stabilize Enum datatype (#21686)
- Support Polars int128 in from arrow (#21688)
- Use FFI to read dataframe instead of transmute (#21673)
- Enable new streaming memory sinks by default (#21589)
- Cloud support for new-streaming scans and sinks (#21621)
- Add len method to arr (#21618)
- Closeable files on unix (#21588)
- Add new
PartitionMaxSizesink (#21573) - Support engine callback for
LazyFrame.profile(#21534) - Dispatch new-streaming CSV negative slice to separate node (#21579)
- Add NDJSON source to new streaming engine (#21562)
- Support passing
tokeninstorage_optionsfor GCP cloud (#21560)
🐞 Bug fixes
- Expose and document partitions (#21765)
- Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
- Fix error due to race condition in file cache (#21753)
- Clear NaNs due to zero-weight division in rolling var/std (#21761)
- Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
- Disallow cast from boolean to categorical/enum (#21714)
- Don't check sortedness in
join_asofwhen 'by' groups supplied, but issue warning (#21724) - Incorrect multithread path taken for aggregations (#21727)
- Disallow cast to empty Enum (#21715)
- Fix
list.meanandlist.medianreturning Float64 for temporal types (#21144) - Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
- Always fallback in SkipBatchPredicate (#21711)
- New streaming multiscan deadlock (#21694)
- Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
- IO plugin; support empty iterator (#21704)
- Support nulls in multi-column sort (#21702)
- Window function check length of groups state (#21697)
- Support 128 sum reduction on new streaming (#21691)
- IPC round-trip of list of empty view with non-empty bufferset (#21671)
- Variance can never be negative (#21678)
- Incorrect loop length in new-streaming group by (#21670)
- Right join on multiple columns not coalescing left_on columns (#21669)
- Casting Struct to String panics if n_chunks > 1 (#21656)
- Fix
Future attached to different looperror onread_database_uri(#21641) - Fix deadlock in cache + hconcat (#21640)
- Properly handle phase transitions in row-wise sinks (#21600)
- Enable new streaming memory sinks by default (#21589)
- Always use global registry for object (#21622)
- Check enum categories when reading csv (#21619)
- Unspecialized prefiltering on nullable arrays (#21611)
- Release the gil on explain (#21607)
- Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
- Bad null handling in unordered row encoding (#21603)
- Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
- Bad view index in BinaryViewBuilder (#21590)
- Fix CSV count with comment prefix skipped empty lines (#21577)
- New streaming IPC enum scan (#21570)
- Several aspects related to ParquetColumnExpr (#21563)
- Don't hit parquet::pre-filtered in case of pre-slice (#21565)
📖 Documentation
- Add skrub to ecosystem.md (#21760)
- Add example for percentile rank (#21746)
- Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
- Add expression composability to PySpark comparison (#21473)
- Document
read_().lazy()antipattern (#21623) - Update Polars Cloud interactive workflow examples (#21609)
- Add a
Plotnineexample to the visualization docs (#21597) - Add cloud api reference to Ref guide (#21566)
🛠️ Other improvements
- Remove variance numerical stability hack (#21749)
- Only use chrono_tz timezones in hypothesis testing (#21721)
- Remove order check from flaky test (#21730)
- Add sinks into the DSL before optimization (#21713)
- Add missing test case for #21701 (#21709)
- Remove old-streaming from engine argument (#21667)
- Add as_phys_any to PrivateSeries for downcasting (#21696)
- Use FFI to read dataframe instead of transmute (#21673)
- Work around typos ignore bug (#21672)
- Added Test For
datetime_rangeNanosecond Overflow (#21354) - Update to edition 2024 (#21662)
- Update rustc (#21647)
- Support object from chunks (#21636)
- Push versioned docs on workflow dispatch (#21630)
- Fail docs early (#21629)
- Check major/minor in docs (#21626)
- Add docs workflow (#21624)
- Add test for 21581 (#21617)
- Remove even more parquet multiscan handling (#21601)
- Remove multiscan handling from new streaming parquet source (#21584)
- Prepare skeleton for partitioning sinks (#21536)
Thank you to all our contributors for making this release possible!
@GaelVaroquaux, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @alexander-beedie, @coastalwhite, @dependabot[bot], @jrycw, @kdn36, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46, @wence- and dependabot[bot]
Python Polars 1.24.0
🚀 Performance improvements
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
✨ Enhancements
- Add lossy decoding to
read_csvfor non-utf8 encodings (#21433) - Add
DataFrame.write_iceberg(#15018) - Add 'nulls_equal' parameter to
is_in(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}(#21528) - IR Serde cross-filter (#21488)
- Give priority to pycapsule interface in from_dataframe (#21377)
- Support writing
Timetype in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionErrorvariant toPolarsErrorinpolars-error(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
🐞 Bug fixes
- Categorical min/max panicking when string cache is enabled (#21552)
- Don't encode IPC record batch twice (#21525)
- Respect rewriting flag in Node rewriter (#21516)
- Correct skip batch predicate for partial statistics (#21502)
- Make the Parquet Sink properly phase aware (#21499)
- Don't divide by zero in partitioned group-by (#21498)
- Create new linearizer between rowwise new streaming sink phases (#21490)
- Don't drop rows in sinks between new streaming phases (#21489)
- Incorrect lazy schema for
Expr.list.diff(#21484) - Give priority to pycapsule interface in from_dataframe (#21377)
- Duration Series arithmetic operations (#21425)
- Fix unwrap None panic when filtering delta with missing columns (#21453)
- Use stable sort for rolling-groupby (#21444)
- Throw exception if dataframe is too large to be compatible with Excel (#20900)
- Address regression with
read_excelnot handling URL paths correctly (#21428)
📖 Documentation
- Fix typo (#21554)
- Correct typos and grammar in Python docstrings (#21524)
- Move llm page under misc (#21550)
- Polars Cloud docs (#21548)
- Add LazyFrame.remote docs entry (#21529)
- Specify that the key column must be sorted in ascending order in
merge_sorted(#21501) - Add Polars & LLMs page to the user guide (#21218)
- Mention that
statistics=Truedoesn't enable all statistics insink_parquet()(#21434)
🛠️ Other improvements
- Don't take ownership of IRplan in new streaming engine (#21551)
- Refactor code for re-use by streaming NDJSON source (#21520)
- Simplify the phase handling of new streaming sinks (#21530)
- Improve IPC sink node parallelism (#21505)
- Use tikv-jemallocator (#21486)
- Rename 'join_nulls' parameter to 'nulls_equal' in join functions (#21507)
- Move rolling to polars-compute (#21503)
- Remove Growable in favor of ArrayBuilder (#21500)
- Introduce a Sink Node trait in the new streaming engine (#21458)
- Add test for rolling stability sort (#21456)
- Add test for empty
.is_inpredicate filter (#21455) - Test for unique length on multiple columns (#21418)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @banflam, @braaannigan, @coastalwhite, @dependabot[bot], @etiennebacher, @ghuls, @kevinjqliu, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stijnherfst, @thomasjpfan and dependabot[bot]