Skip to content

Conversation

@nvauto
Copy link
Collaborator

@nvauto nvauto commented Feb 2, 2026

auto-merge triggered by github actions on release/26.02 to create a PR keeping main up-to-date. If this PR is unable to be merged due to conflicts, it will remain open until manually fix.

…oid> inference (#14243)

Closes #14233

## Description
This PR addresses test failures in `test_parquet_testing_valid_files`
for `null_list.parquet` on Spark 4.1.0+.

## Root Cause
**SPARK-54220** introduced correct NullType/VOID/UNKNOWN type support in
Parquet schema inference starting from Spark 4.1.0. This upstream change
causes different schema inference behavior:

- **Spark 3.5.0 - 4.0.x**: Incorrectly infers `array<int>` for null
arrays with UNKNOWN logical type
- **Spark 4.1.0+**: Correctly infers `array<void>` for null arrays with
UNKNOWN logical type (per SPARK-54220)

The `null_list.parquet` file from the `parquet-testing` repository has a
physical schema with `optional int32 item` but uses the UNKNOWN logical
type annotation. RAPIDS plugin does not support `array<void>` on GPU
(TypeSig.NULL is not included in nested types for Parquet cudfRead),
causing the test to fail with:

```
IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ColumnarToRowExec
ReadSchema: struct<emptylist:array<void>>
```

## Solution
Add a version-conditional xfail for `null_list.parquet` on Spark 4.1.0+
to reflect the upstream schema inference improvement and RAPIDS' current
limitation with `array<void>` support.

## Changes
- Updated `parquet_testing_test.py` to xfail `null_list.parquet` for
Spark 4.1.0+
- Added `is_spark_411_or_later()` import
- Updated copyright year to 2026

## Related Issues
- Related to #14242 (audit issue for SPARK-54220)

### Checklists

- [ ] This PR has added documentation for new or modified features or
behaviors.
- [x] This PR has added new tests or modified existing tests to cover
new code paths.
- [ ] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

Signed-off-by: Chong Gao <[email protected]>

Signed-off-by: Chong Gao <[email protected]>
Co-authored-by: Chong Gao <[email protected]>
@nvauto nvauto merged commit c125e89 into main Feb 2, 2026
@nvauto
Copy link
Collaborator Author

nvauto commented Feb 2, 2026

SUCCESS - auto-merge

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 2, 2026

Greptile Overview

Greptile Summary

This PR merges changes from the release/26.02 branch to main, including a fix for Spark 4.1.1+ compatibility with null_list.parquet testing.

The changes follow existing patterns in the codebase for version-specific test handling and are consistent with similar conditional logic for Spark 3.5.0+ compatibility.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are straightforward and low-risk: a copyright year update, an import addition, and a version-specific xfail for a known incompatibility. The implementation follows established patterns in the codebase and is well-documented with issue references.
  • No files require special attention

Important Files Changed

Filename Overview
integration_tests/src/main/python/parquet_testing_test.py Added xfail for null_list.parquet on Spark 4.1.1+ due to array inference change, updated copyright year to 2026

Sequence Diagram

sequenceDiagram
    participant Test as Parquet Testing Test
    participant SparkSession as spark_session.py
    participant TestFramework as PyTest Framework
    
    Note over Test: Module initialization
    Test->>SparkSession: is_spark_411_or_later()
    SparkSession-->>Test: Returns boolean (True/False)
    
    alt Spark >= 4.1.1
        Test->>Test: Add null_list.parquet to _xfail_files
        Note over Test: Mark as expected failure<br/>due to array<void> inference
    end
    
    Test->>Test: gen_testing_params_for_valid_files()
    Test->>Test: Check if null_list.parquet in _xfail_files
    
    alt File is in _xfail_files
        Test->>TestFramework: Create test with xfail marker
        Note over TestFramework: Test will be marked as<br/>expected failure
    else File is valid
        Test->>TestFramework: Create regular test parameter
    end
    
    TestFramework->>Test: Execute test_parquet_testing_valid_files()
    Test->>Test: assert_gpu_and_cpu_are_equal_collect()
    
    alt Test marked as xfail and fails
        Test-->>TestFramework: Expected failure (xfail)
        Note over TestFramework: Test passes as expected
    else Test succeeds
        Test-->>TestFramework: Success
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants