Releases: pola-rs/polars
Python Polars 1.36.0-beta.2
🏆 Highlights
- Add Extension types (#25322)
✨ Enhancements
- Add SQL support for
ROW_NUMBER,RANK, andDENSE_RANKfunctions (#25409) - Add SQL support for named
WINDOWreferences (#25400) - Add
BIT_NOTsupport to the SQL interface (#25094) - Add
LazyFrame.pivot(#25016) - Add
allow_emptyflag toitem(#25048) - Add
empty_as_nullandkeep_nullsflags toExpr.explode(#25289) - Add
empty_as_nullandkeep_nullsto{Lazy,Data}Frame.explode(#25369) - Add
havingtogroup_bycontext (#23550) - Add
ignore_nullstofirst/last(#25105) - Add
maintain_ordertoExpr.mode(#25377) - Add
quantilefor missing temporals (#25464) - Add leftmost option to
str.replace_many / str.find_many / str.extract_many(#25398) - Add strict parameter to pl.concat(how='horizontal') (#25452)
- Add support for
Float16dtype (#25185) - Add unstable
Schema.to_arrow(#25149) - Allow
Expr.rollingin aggregation contexts (#25258) - Allow
Expr.uniqueonList/Arraywith non-numeric types (#25285) - Allow
glimpseto return aDataFrame(#24803) - Allow
hashfor allListdtypes (#25372) - Allow
implodeand aggregation in aggregation context (#25357) - Allow
sliceon scalar in aggregation context (#25358) - Allow arbitrary Expressions in "subset" parameter of
uniqueframe method (#25099) - Allow arbitrary expressions as the
Expr.rollingindex_column(#25117) - Allow bare
.rowon a single-row DataFrame, equivalent to.itemon a single-element DataFrame (#25229) - Allow elementwise
Expr.overin aggregation context (#25402) - Allow pl.Object in pivot value (#25533)
- Automatically Parquet dictionary encode floats (#25387)
- Display function of streaming physical plan
mapnode (#25368) - Documentation on Polars Cloud manifests (#25295)
- Expose and document pl.Categories (#25443)
- Expose fields for generating physical plan visualization data (#25562)
- Extend SQL
UNNESTsupport to handle multiple array expressions (#25418) - Improve SQL
UNNESTbehaviour (#22546) - Improve error message on unsupported SQL subquery comparisons (#25135)
- Make DSL-hash skippable (#25140)
- Minor improvement for
as_structrepr (#25529) - Move GraphMetrics into StreamingQuery (#25310)
- Raise suitable error on non-integer "n" value for
clear(#25266) - Rewrite
IR::ScantoIR::DataFrameScaninexpand_datasetswhen applicable (#25106) - Set polars/ user-agent (#25112)
- Streaming
{Expr,LazyFrame}.rolling(#25058) - Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Support
ewm_var/stdin streaming engine (#25109) - Support
unique_countsfor all datatypes (#25379) - Support additional forms of SQL "CREATE TABLE" statements (#25191)
- Support arbitrary expressions in SQL
JOINconstraints (#25132) - Support column-positional SQL "UNION" operations (#25183)
- Support decimals in search_sorted (#25450)
- Temporal
quantilein rolling context (#25479) - Use reference to Graph pipes when flushing metrics (#25442)
🚀 Performance improvements
- Add parquet prefiltering for string regexes (#25381)
- Add streaming native
LazyFrame.group_by_dynamic(#25342) - Add streaming sorted Group-By (#25013)
- Allow detecting plan sortedness in more cases (#25408)
- Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Enable predicate expressions on unsigned integers (#25416)
- Fast find start window in
group_by_dynamicwith largeoffset(#25376) - Faster kernels for rle_lengths (#25448)
- Fuse positive
sliceinto streamingLazyFrame.rolling(#25338) - Lazy gather for
{forward,backward}_fillin group-by contexts (#25115) - Mark
Expr.reshape((-1,))as row separable (#25326) - Mark output of more non-order-maintaining ops as unordered (#25419)
- Optimize ipc stream read performance (#24671)
- Reduce HuggingFace API calls (#25521)
- Return references from
aexpr_to_leaf_names_iter(#25319) - Skip filtering scan IR if no paths were filtered (#25037)
- Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
- Use fast path for
agg_min/agg_maxwhen nulls present (#25374) - Use strong hash instead of traversal for CSPE equality (#25537)
🐞 Bug fixes
- Add
.rolling_ranksupport for temporal types andpl.Boolean(#25509) - Address issues with SQL
OVERclause behaviour for window functions (#25249) - Aggregation with
drop_nullson literal (#25356) - Allow
Nulldtype values inscatter(#25245) - Allow broadcast in
group_byforApplyExprandBinaryExpr(#25053) - Allow empty list in
sort_byinlist.evalcontext (#25481) - Allow for negative time in
group_by_dynamiciterator (#25041) - Always respect return_dtype in map_elements and map_rows (#25504)
- AnyValue::to_physical for categoricals (#25341)
- Apply CSV dict overrides by name only (#25436)
- Block predicate pushdown when
group_bykey values are changed (#25032) - Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
- Correct
drop_itemsfor scalar input (#25351) - Correct
eq_missingfor struct with nulls (#25363) - Correct
{first,last}_non_nullif there are empty chunks (#25279) - Correct handle requested stops in streaming shift (#25239)
- Correctly prune projected columns in hints (#25250)
- DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Don't push down predicates passed inserted cache nodes (#25042)
- Don't quietly allow unsupported SQL
SELECTclauses (#25282) - Don't trigger
DeprecationWarningfrom SQL "IN" constraints that use subqueries (#25111) - Enhanced column resolution/tracking through multi-way SQL joins (#25181)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Ensure out-of-range integers and other edge case values don't give wrong results for index_of (#24369)
- Fix CSV
select(len)off by 1 with comment prefix (#25069) - Fix
arr.{eval,agg}in aggregation context (#25390) - Fix
format_strin case of multiple chunks (#25162) - Fix
groupsupdate on slices with different offsets (#25097) - Fix assertion panic on
group_by(#25179) - Fix building polars-expr without timezones feature (#25254)
- Fix building polars-mem-engine with the async feature (#25300)
- Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
- Fix dictionary replacement error in
write_ipc(#25497) - Fix expr slice pushdown causing shape error on literals (#25485)
- Fix field metadata for nested categorical PyCapsule export (#25052)
- Fix group lengths check in
sort_bywithAggregatedScalar(#25503) - Fix handling
Nulldtype inApplyExprongroup_by(#25077) - Fix incorrect
.list.evalafter slicing operations (#25540) - Fix incorrect reshape on sliced lists (#25139)
- Fix length preserving check for
evalexpressions in streaming engine (#25294) - Fix occurence of exact matches of
.join_asof(strategy="nearest", allow_exact_matches=False, ...)(#25506) - Fix off-by-one bug in
ColumnPredicatesgeneration for inequalities operating on integer columns (#25412) - Fix panic if scan predicate produces 0 length mask (#25089)
- Fix panic in
dt.truncatefor invalid duration strings (#25124) - Fix panic in is_between support in streaming Parquet predicate push down (#25476)
- Fix panic when using struct field as join key (#25059)
- Fix serialization of lazyframes containing huge tables (#25190)
- Fix single-column CSV header duplication with leading empty lines (#25186)
- Fix small bug with
PyExprtoPyObjectconversion (#25265) - Group-By aggregation problems caused by
AmortSeries(#25043) - Handle some unusual
pl.col.<colname>edge-cases (#25153) - Incorrect result in aggregated
first/lastwithignore_nulls(#25414) - Incorrect results for aggregated
{n_,}uniqueon bools (#25275) - Invert
drop_nansfiltering in group-by context (#25146) - Make
str.json_decodeoutput deterministic with lists (#25240) - Mark
{forward,backward}_fillaslength_preserving(#25352) - Minor improvement to internal
is_pycapsuleutility function (#25073) - Nested dtypes in streaming
first_non_null/last_non_null(#25375) - Nested dtypes in streaming
first/last(#25298) - Panic exception when calling
Expr.rollingin.over(#25283) - Panic in
group_by_dynamicwithgroup_byand multiple chunks (#25075) - Parquet
is_infor mixed validity pages (#25313) - Prevent panic when joining sorted LazyFrame with itself (#25453)
- Raise error for all/any on list instead of panic (#25018)
- Raise error on out-of-range dates in temporal operations (#25471)
- Remove
Exprcasts inpl.litinvocations (#25373) - Resolve edge-case with SQL aggregates that have the same name as one of the "GROUP BY" keys (#25362)
- Return the correct string-case
Exprreprs (#25101) - Reverse on chunked
struct(#25281) - Revert
pl.formatbehavior with nulls (#25370) - Rolling
mean/medianfor temporals (#25512) - Run async DB queries with regular
asyncioif not inside a running loop (#25268) - SQL "NATURAL" joins should coalesce the key columns (#25353)
- Schema mismatch with
list.agg,uniqueand scalar (#25348) - Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Strict conversion AnyValue to Struct (#25536)
- Support "index" as column name in
group_byiterator (#25138) - Support
AggregatedListinlist.{eval,agg}context (#25385) - The
SQLinterface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091) - Unique key names in streaming sort/top_k (#25082)
- Unique on literal in aggregation context (#25359)
- Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
- Use Cargo.template.toml to prevent git dependencies from using template (#25392)
- Validate list.slice parameters are not list...
Python Polars 1.35.2
- Fix incorrect
drop_nans()result when used ingroup_by()/over()(#25146) - Fix handling
Nulldtype inApplyExprongroup_by(#25077) - Fix assertion panic on
group_by(#25179) - Fix Wide-table join performance regression (#25222)
Thank you to all our contributors for making this release possible!
@coastalwhite, @kdn36, @nameexhaustion and @ritchie46
Rust Polars 0.52.0
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Lazy gather for
{forward,backward}_fillin group-by contexts (#25115) - Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Skip filtering scan IR if no paths were filtered (#25037)
- Optimize ipc stream read performance (#24671)
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
uniqueto native group-by and speed upn_uniquein group-by context (#24976) - Better parallelize
take{_slice,}_unchecked(#24980) - Implement native
skewandkurtosisin group-by context (#24961) - Use native group-by aggregations for
bitwise_*operations (#24935) - Address
group_by_dynamicslowness in sparse data (#24916) - Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
- Optimize gather_every(n=1) to slice (#24704)
- Lower null count to streaming engine (#24703)
- Native streaming
gather_every(#24700) - Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
✨ Enhancements
- Improve error message on unsupported SQL subquery comparisons (#25135)
- Rewrite
IR::ScantoIR::DataFrameScaninexpand_datasetswhen applicable (#25106) - Support
ewm_var/stdin streaming engine (#25109) - Make DSL-hash skippable (#25140)
- Streaming
{Expr,LazyFrame}.rolling(#25058) - Set polars/<version> user-agent (#25112)
- Add
BIT_NOTsupport to the SQL interface (#25094) - Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Add
allow_emptyflag toitem(#25048) - Support
ewm_mean()in streaming engine (#25003) - Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.itemto strictly extract a single value from an expression (#24888) - Add environment variable to roundtrip empty struct in Parquet (#24914)
- Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
nth_set_bit_u64()with unit test (#24035) - Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
🐞 Bug fixes
- Fix CSV
select(len())off by 1 with comment prefix (#25069) - Fix incorrect reshape on sliced lists (#25139)
- Support "index" as column name in
group_byiterator (#25138) - DSL_SCHEMA_HASH should not changed by line endings (#25123)
- Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
- Fix panic in
dt.truncatefor invalid duration strings (#25124) - Don't trigger
DeprecationWarningfrom SQL "IN" constraints that use subqueries (#25111) - Return the correct string-case
Exprreprs (#25101) - Fix
groupsupdate on slices with different offsets (#25097) - Fix handling
Nulldtype inApplyExprongroup_by(#25077) - Raise error for all/any on list instead of panic (#25018)
- Unique key names in streaming sort/top_k (#25082)
- The
SQLinterface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091) - Fix panic if scan predicate produces 0 length mask (#25089)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Panic in
group_by_dynamicwithgroup_byand multiple chunks (#25075) - Fix panic when using struct field as join key (#25059)
- Allow broadcast in
group_byforApplyExprandBinaryExpr(#25053) - Fix field metadata for nested categorical PyCapsule export (#25052)
- Block predicate pushdown when
group_bykey values are changed (#25032) - Group-By aggregation problems caused by
AmortSeries(#25043) - Don't push down predicates passed inserted cache nodes (#25042)
- Allow for negative time in
group_by_dynamiciterator (#25041) - Re-enable CPU feature check before import (#25010)
- Correctness
any(ignore_nulls)and OOB inall(#25005) - Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asofon a casted expression (#25006) - Optimize memory on rolling groups in
ApplyExpr(#24709) - Fallback
Pyarrowscan to in-memory engine (#24991) - Make
Operator::swap_operandsreturn correct operators forPlus,Minus,MultiplyandDivide(#24997) - Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change(#24952) - Raise length mismatch on
overwith sliced groups (#24887) - Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any/allfor group-by (#24940) - Do not optimize cross join to iejoin if order maintaining (#24950)
- Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Support
decimal_commaonDecimaltype inwrite_csv(#24718) - Parse
Decimalwith comma as decimal separator in CSV (#24685) - Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/diffto `polars-pla...
Python Polars 1.35.1
🚀 Performance improvements
- Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
- Skip filtering scan IR if no paths were filtered (#25037)
- Optimize ipc stream read performance (#24671)
✨ Enhancements
- Support BYTE_ARRAY backed Decimals in Parquet (#25076)
- Allow
glimpseto return aDataFrame(#24803) - Add
allow_emptyflag toitem(#25048)
🐞 Bug fixes
- The
SQLinterface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091) - Fix panic if scan predicate produces 0 length mask (#25089)
- Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
- Panic in
group_by_dynamicwithgroup_byand multiple chunks (#25075) - Minor improvement to internal
is_pycapsuleutility function (#25073) - Fix panic when using struct field as join key (#25059)
- Allow broadcast in
group_byforApplyExprandBinaryExpr(#25053) - Fix field metadata for nested categorical PyCapsule export (#25052)
- Block predicate pushdown when
group_bykey values are changed (#25032) - Group-By aggregation problems caused by
AmortSeries(#25043) - Don't push down predicates passed inserted cache nodes (#25042)
- Allow for negative time in
group_by_dynamiciterator (#25041)
📖 Documentation
- Fix typo in public dataset URL (#25044)
🛠️ Other improvements
- Disable recursive CSPE for now (#25085)
- Change group length mismatch error to
ShapeError(#25004) - Update toolchain (#25007)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @Liyixin95, @alexander-beedie, @coastalwhite, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.35.0
🏆 Highlights
- Stabilize decimal (#25020)
🚀 Performance improvements
- Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
- Lower
uniqueto native group-by and speed upn_uniquein group-by context (#24976) - Better parallelize
take{_slice,}_unchecked(#24980) - Implement native
skewandkurtosisin group-by context (#24961) - Use native group-by aggregations for
bitwise_*operations (#24935) - Address
group_by_dynamicslowness in sparse data (#24916) - Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
✨ Enhancements
- Stabilize decimal (#25020)
- Support
ewm_mean()in streaming engine (#25003) - Improve row-count estimates (#24996)
- Remove filtered scan paths in IR when possible (#24974)
- Introduce remote Polars MCP server (#24977)
- Allow local scans on polars cloud (configurable) (#24962)
- Add
Expr.itemto strictly extract a single value from an expression (#24888) - Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len())(#24602) - Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Don't require PyArrow for
read_database_uriif ADBC engine version supports PyCapsule interface (#24029) - Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Drop PyArrow requirement for non-batched usage of
read_databasewith the ADBC engine and supportiter_batcheswith the ADBC engine (#24180) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Support
np.ndarray -> AnyValueconversion (#24748) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
🐞 Bug fixes
- Re-enable CPU feature check before import (#25010)
- Implement
read_excelworkaround for fastexcel/calamine issue loading a column subset from a named table (#25012) - Correctness
any(ignore_nulls)and OOB inall(#25005) - Streaming any/all with ignore_nulls=False (#25008)
- Fix incorrect
join_asofon a casted expression (#25006) - Optimize memory on rolling groups in
ApplyExpr(#24709) - Fallback
Pyarrowscan to in-memory engine (#24991) - Make
Operator::swap_operandsreturn correct operators forPlus,Minus,MultiplyandDivide(#24997) - Capitalize letters after numbers in to_titlecase (#24993)
- Preserve null values in
pct_change(#24952) - Raise length mismatch on
overwith sliced groups (#24887) - Check duplicate name in transpose (#24956)
- Follow Kleene logic in
any/allfor group-by (#24940) - Do not optimize cross join to iejoin if order maintaining (#24950)
- Fix typing of
scan_parquetpartially unknown (#24928) - Properly release the GIL for
read_parquet_metadata(#24922) - Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix regression on
write_database()to Snowflake due to unsupported string view type (#24622) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Ensure
schema_overridesis respected when loading iterable row data (#24721) - Support
decimal_commaonDecimaltype inwrite_csv(#24718)
📖 Documentation
- Introduce remote Polars MCP server (#24977)
- Add
{arr,list}.aggAPI references (#24970) - Support LLM in docs (#24958)
- Update Cloud docs with correct fn argument order (#24939)
- Update
name.replaceexamples (#24941) - Add i128 and u128 features to user guide (#24938)
- Add partitioning examples for
sink_*methods (#24918) - Add more
{unique,value}_countsexamples (#24927) - Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.fieldinto the api docs (#24846) - Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
📦 Build system
- Ensure
build_feature_flags.pyis included in artifact (#25024) - Update pyo3 and numpy crates to version 0.26 (#24760)
🛠️ Other improvements
- Fix benchmark ci (#25019)
- Fix non-deterministic test (#25009)
- Fix makefile arch detection (#25011)
- Make
LazyFrame.set_sortedinto aFunctionIR::Hint(#24981) - Remove symbolic links (#24982)
- Deprecate
Expr.agg_groups()andpl.groups()(#24919) - Dispatch to no-op rayon thread-pool from streaming (#24957)
- Unpin pydantic (#24955)
- Ensure safety of scan fast-count IR lowering in streaming (#24953)
- Re-use iterators in
set_operations (#24850) - Remove
GroupByPartitionedand dispatch to streaming engine (#24903) - Turn
element()into{A,}Expr::Element(#24885) - Pass
ScanOptionstonew_from_ipc(#24893) - Update tests to be index type agnostic (#24891)
- Unset
ContextinWindowexpression (#24875) - Fix failing delta test (#24867)
- Move
FunctionExprdispatch fromplantoexpr(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr(#24825) - Add
days_in_monthto documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.formatinto proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpringroup_bycontext on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rollinggroups tooverlapping(#24577) - Refactor
DataTypepropteststrategies (#24763) - Add
unionto documentation (#24769)
Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean
Python Polars 1.35.0-beta.1
🚀 Performance improvements
- Address
group_by_dynamicslowness in sparse data (#24916) - Push filters to PyIceberg (#24910)
- Native
filter/drop_nulls/drop_nansin group-by context (#24897) - Implement
cumulative_evalusing the group-by engine (#24889) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Implement native
null_count,anyandallgroup-by aggregations (#24859) - Speed up
reversein group-by context (#24855) - Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
- Don't check duplicates on streaming simple projection in release mode (#24830)
- Lower approx_n_unique to the streaming engine (#24821)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Use native reducer for
first/laston Decimals, Categoricals and Enums (#24786) - Implement indexed method for
BitMapIter::nth(#24766) - Pushdown slices on plans within unions (#24735)
✨ Enhancements
- Add environment variable to roundtrip empty struct in Parquet (#24914)
- Fast-count for
scan_iceberg().select(len())(#24602) - Add
globparameter toscan_ipc(#24898) - Prevent generation of copies of
Dataframes inDslPlanserialization (#24852) - Add
list.aggandarr.agg(#24790) - Implement
{Expr,Series}.rolling_rank()(#24776) - Don't require PyArrow for
read_database_uriif ADBC engine version supports PyCapsule interface (#24029) - Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Support MergeSorted in CSPE (#24805)
- Duration/interval string parsing optimisation (2-5x faster) (#24771)
- Recursively apply CSPE (#24798)
- Add streaming engine per-node metrics (#24788)
- Add
arr.eval(#24472) - Drop PyArrow requirement for non-batched usage of
read_databasewith the ADBC engine and supportiter_batcheswith the ADBC engine (#24180) - Improve rolling_(sum|mean) accuracy (#24743)
- Add
separatorto{Data,Lazy}Frame.unnest(#24716) - Add
union()function for unordered concatenation (#24298) - Add
name.replaceto the set of column rename options (#17942) - Support
np.ndarray -> AnyValueconversion (#24748) - Allow duration strings with leading "+" (#24737)
- Drop now-unnecessary post-init "schema_overrides" cast on
DataFrameload from list of dicts (#24739) - Add support for UInt128 to pyo3-polars (#24731)
🐞 Bug fixes
- Properly release the GIL for
read_parquet_metadata(#24922) - Broadcast
partition_bycolumns inoverexpression (#24874) - Clear index cache on stacked
df.filterexpressions (#24870) - Fix 'explode' mapping strategy on scalar value (#24861)
- Fix repeated
with_row_index()afterscan()silently ignored (#24866) - Correctly return min and max for enums in groupby aggregation (#24808)
- Refactor
BinaryExpringroup_bydispatch logic (#24548) - Fix aggstate for
gather(#24857) - Keep scalars for length preserving functions in
group_by(#24819) - Have
rangefeature depend ondtype-arrayfeature (#24853) - Fix duplicate select panic (#24836)
- Inconsistency of list.sum() result type with None values (#24476)
- Division by zero in Expr.dt.truncate (#24832)
- Potential deadlock in __arrow_c_stream__ (#24831)
- Allow double aggregations in group-by contexts (#24823)
- Series.shrink_dtype for i128/u128 (#24833)
- Fix dtype in
EvalExpr(#24650) - Allow aggregations on
AggState::LiteralScalar(#24820) - Dispatch to
group_awarefor fallible expressions with masked out elements (#24815) - Fix error for
arr.sum()on small integer Array dtypes containing nulls (#24478) - Fix regression on
write_database()to Snowflake due to unsupported string view type (#24622) - Fix XOR did not follow kleene when one side is unit-length (#24810)
- Make
Seriesinit consistent withDataFrameinit for string values declared with temporal dtype (#24785) - Incorrect precision in Series.str.to_decimal (#24804)
- Use
overlappinginstead ofrolling(#24787) - Fix iterable on
dynamic_group_byandrollingobject (#24740) - Use Kahan summation for in-memory groupby sum/mean (#24774)
- Release GIL in PythonScan predicate evaluation (#24779)
- Type error in
bitmask::nth_set_bit_u64(#24775) - Add
Expr.signforDecimaldatatype (#24717) - Correct
str.replacewith missing pattern (#24768) - Ensure
schema_overridesis respected when loading iterable row data (#24721) - Support
decimal_commaonDecimaltype inwrite_csv(#24718)
📖 Documentation
- Add partitioning examples for
sink_*methods (#24918) - Add more
{unique,value}_countsexamples (#24927) - Indent the versionchanged (#24783)
- Relax fsspec wording (#24881)
- Add
pl.fieldinto the api docs (#24846) - Fix duplicated article in SECURITY.md (#24762)
- Document output name determination in when/then/otherwise (#24746)
- Specify that precision=None becomes 38 for Decimal (#24742)
- Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
- Fix source mapping (#24736)
📦 Build system
- Update pyo3 and numpy crates to version 0.26 (#24760)
🛠️ Other improvements
- Re-use iterators in
set_operations (#24850) - Remove
GroupByPartitionedand dispatch to streaming engine (#24903) - Turn
element()into{A,}Expr::Element(#24885) - Pass
ScanOptionstonew_from_ipc(#24893) - Update tests to be index type agnostic (#24891)
- Unset
ContextinWindowexpression (#24875) - Fix failing delta test (#24867)
- Move
FunctionExprdispatch fromplantoexpr(#24839) - Fix SQL test giving wrong error message (#24835)
- Consolidate dtype paths in
ApplyExpr(#24825) - Add
days_in_monthto documentation (#24822) - Enable ruff D417 lint (#24814)
- Turn
pl.formatinto proper elementwise expression (#24811) - Fix remote benchmark by no-longer saving builds (#24812)
- Refactor
ApplyExpringroup_bycontext on multiple inputs (#24520) - IR text plan graph generator (#24733)
- Temporarily pin pydantic to fix CI (#24797)
- Extend and rename
rollinggroups tooverlapping(#24577) - Refactor
DataTypepropteststrategies (#24763) - Add
unionto documentation (#24769)
Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @nameexhaustion, @orlp, @pavelzw, @ritchie46, @thomasjpfan and @williambdean
Python Polars 1.34.0
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Optimize gather_every(n=1) to slice (#24704)
- Lower null count to streaming engine (#24703)
- Native streaming
gather_every(#24700) - Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Avoid forcing a
pyarrowdependency inread_excelwhen using the default "calamine" engine (#24655) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Removing dots after noqa comments (#24722)
- Parse
Decimalwith comma as decimal separator in CSV (#24685) - Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Add default parquet compression levels (#24686)
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
🛠️ Other improvements
- Removing dots after noqa comments (#24722)
- Make
test_multiple_sorting_columnstest runnable (#24719) - Remove
{Upper,Lower}Boundexpressions in IR (#24701) - Fix Makefile
uv pipoption syntax (#24711) - Add egg-info to gitignore (#24712)
- Restructure python project directories again (#24676)
- Use IR for
polars-exproutput field resolution (#24661) - Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.5
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Avoid forcing a
pyarrowdependency inread_excelwhen using the default "calamine" engine (#24655) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Add default parquet compression levels (#24686)
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
🛠️ Other improvements
- Restructure python project directories again (#24676)
- Use IR for
polars-exproutput field resolution (#24661) - Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.4
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
- Use cargo-run to call dsl-schema script (#24607)
🛠️ Other improvements
- Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst
Python Polars 1.34.0-beta.3
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
- Use cargo-run to call dsl-schema script (#24607)
🛠️ Other improvements
- Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@Gusabary, @Kevin-Patyk, @Matt711, @moizescbf, @alonsosilvaallende, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst