Python Polars 1.34.0
🏆 Highlights
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
🚀 Performance improvements
- Optimize gather_every(n=1) to slice (#24704)
- Lower null count to streaming engine (#24703)
- Native streaming
gather_every(#24700) - Pushdown filter with
strptimeif input is literal (#24694) - Avoid copying expanded paths (#24669)
- Relax filter expr ordering (#24662)
- Remove unnecessary
groupscall inaggregated(#24651) - Skip files in
scan_icebergwith filter based on metadata statistics (#24547) - Push row_index predicate for all scan types (#24537)
- Perform integer in-filtering for Parquet inequality predicates (#24525)
- Stop caching Parquet metadata after 8 files (#24513)
- Native streaming
.mode()expression (#24459)
✨ Enhancements
- Implement maintain_order for cross join (#24665)
- Add support to output
dt.total_{}()duration values as fractionals (#24598) - Avoid forcing a
pyarrowdependency inread_excelwhen using the default "calamine" engine (#24655) - Support scanning from
file:/pathURIs (#24603) - Log which file the schema was sourced from, and which file caused an extra column error (#24621)
- Add
LazyFrame.{sink,collect}_batches(#23980) - Deterministic import order for Python Polars package variants (#24531)
- Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
- Add unstable
hidden_file_prefixparameter toscan_parquet(#24507) - Use fixed-scale Decimals (#24542)
- Add support for unsigned 128-bit integers (#24346)
- Add unstable
pl.Config.set_default_credential_provider(#24434) - Roundtrip
BinaryOffsettype through Parquet (#24344) - Add opt-in unstable functionality to load interval types as
Struct(#24320) - Support reading parquet metadata from cloud storage (#24443)
- Add user guide section on AWS role assumption (#24421)
- Support
unique/n_unique/arg_uniqueforarraycolumns (#24406)
🐞 Bug fixes
- Removing dots after noqa comments (#24722)
- Parse
Decimalwith comma as decimal separator in CSV (#24685) - Make
Categoriespickleable (#24691) - Shift on array within list (#24678)
- Fix handling of
AggregatedScalarinApplyExprsingle input (#24634) - Support reading of mixed compressed/uncompressed IPC buffers (#24674)
- Overflow in slice-slice optimization (#24658)
- Package discovery for
setuptools(#24656) - Add type assertion to prevent out-of-bounds in
GenericFirstLastGroupedReduction(#24590) - Remove inclusion of polars dir in runtime sdist/wheel (#24654)
- Method
dt.month_endwas unnecessarily raising when the month-start timestamp was ambiguous (#24647) - Widen
from_dictstoIterable[Mapping[str, Any]](#24584) - Fix
unsupported arrow type Dictionaryerror inscan_iceberg()(#24573) - Raise Exception instead of panic when unnest on non-struct column (#24471)
- Include missing feature dependency from
polars-stream/difftopolars-plan/abs(#24613) - Newline escaping in streaming show_graph (#24612)
- Do not allow inferring (
-1) the dimension on anyExpr.reshapedimension except the first (#24591) - Sink batches early stop on in-memory engine (#24585)
- More precisely model expression ordering requirements (#24437)
- Panic in zero-weight rolling mean/var (#24596)
- Decimal <-> literal arithmetic supertype rules (#24594)
- Match various aggregation return types in the streaming engine with the in-memory engine (#24501)
- Validate list type for list expressions in planner (#24589)
- Fix
scan_iceberg()storage options not taking effect (#24574) - Have
log()prioritize the leftmost dtype for its output dtype (#24581) - CSV pl.len() was incorrect (#24587)
- Add support for float inputs for duration types (#24529)
- Roundtrip empty string through hive partitioning (#24546)
- Fix potential OOB writes in unaligned IPC read (#24550)
- Fix regression error when scanning AWS presigned URL (#24530)
- Make
PlPath::joinfor cloud paths replace on absolute paths (#24514) - Correct dtype for cum_agg in streaming engine (#24510)
- Restore support for np.datetime64() in pl.lit() (#24527)
- Ignore Iceberg list element ID if missing (#24479)
- Fix panic on streaming full join with coalesce (#23409)
- Fix
AggStateonall_literalinBinaryExpr(#24461) - Show IR sort options in
explain(#24465) - Benchmark CI import (#24463)
- Fix schema on
ApplyExprwith single rowliteralin agg context (#24422) - Fix planner schema for dividing
pl.Float32by int (#24432) - Fix panic scanning from AWS legacy global endpoint URL (#24450)
- Fix
iterable_to_pydf(..., infer_schema_length=None)to scan all data (#23405) - Do not propagate struct of nulls with null (#24420)
- Be stricter with invalid NDJSON input when
ignore_errors=False(#24404) - Implement
approx_n_uniquefor temporal dtypes and Null (#24417)
📖 Documentation
- Add default parquet compression levels (#24686)
- Fix syntax error in data-types-and-structures.md (#24606)
- Rename
avg_birthday->avg_agein examples aggregation (#23726) - Update Polars Cloud user guide (#24366)
- Fix typo in
set_expr_depth_warningdocstring (#24427)
📦 Build system
🛠️ Other improvements
- Removing dots after noqa comments (#24722)
- Make
test_multiple_sorting_columnstest runnable (#24719) - Remove
{Upper,Lower}Boundexpressions in IR (#24701) - Fix Makefile
uv pipoption syntax (#24711) - Add egg-info to gitignore (#24712)
- Restructure python project directories again (#24676)
- Use IR for
polars-exproutput field resolution (#24661) - Remove dist/ from release python workflow (#24639)
- Escape
sedampersand in release script (#24631) - Remove PyOdide from release for now (#24630)
- Fix sed in-place in release script (#24628)
- Release script pyodide wheel (#24627)
- Release script pyodide wheel (#24626)
- Update release script for runtimes (#24610)
- Remove unused
UnknownKind::Ufunc(#24614) - Use cargo-run to call dsl-schema script (#24607)
- Cleanup and prepare
to_fieldfor element and struct field context (#24592) - Resolve nightly clippy hints (#24593)
- Rename pl.dependencies to pl._dependencies (#24595)
- More release scripting (#24582)
- Again a minor fix for the setup script (#24580)
- Minor fix in release script (#24579)
- Correct release python beta version check (#24578)
- Python dependency failure (#24576)
- Always install yq (#24570)
- Deterministic import order for Python Polars package variants (#24531)
- Check Arrow FFI pointers with an assert (#24564)
- Add a couple of missing type definitions in python (#24561)
- Fix quickstart example in Polars Cloud user guide (#24554)
- Add implementations for loading min/max statistics for Iceberg (#24496)
- Update versions (#24508)
- Add additional unit tests for
pl.concat(#24487) - Refactor parametric tests for
as_structon aggstates (#24493) - Use
PlanCallbackinname.map_*(#24484) - Pin
xlsvwriterto3.2.5or before (#24485) - Add dataclass to hold resolved iceberg scan data (#24418)
- Fix iceberg test failure in CI (#24456)
- Move CompressionUtils to polars-utils (#24430)
- Update github template to dispatch to cloud client (#24416)
Thank you to all our contributors for making this release possible!
@DeflateAwning, @Gusabary, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @alonsosilvaallende, @andreseje, @borchero, @c-peters, @camriddell, @coastalwhite, @dangotbanned, @deanm0000, @dongchao-1, @dsprenkels, @eitsupi, @itamarst, @jan-krueger, @joshuamarkovic, @juansolm, @kdn36, @moizescbf, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst