Skip to content

Conversation

@daguirre11
Copy link
Contributor

@daguirre11 daguirre11 commented Jan 31, 2026

Hi 👋 ,

I believe this private function should be removed instead of tested.

  • There are two functions that invoke _json_default_with_numpy , model_to_string here and dump_model here .
  • Each of these functions use json.dumps(self.pandas_categorical, default=_json_default_with_numpy).
  • self.pandas_categorical is only value other than None if the X data argument is given as a a pandas DataFrame here.
  • np.bool_, np.floating, and np.integer that are in the isinstance() if condition in _json_default_with_numpy here can all be converted to their appropriate python types resulting in self.pandas_categorical = None -> self.pandas_categorical = {}. This is possible because Pandas automatically converts NumPy scalars to pandas dtypes during DataFrame construction of the DataFrame.
  • Lastly, the next if condition in _json_default_with_numpy regarding np.ndarray here is not reachable in the code because it is not allow dtype for a pandas DataFrame based on the instilled checks.

data used:

    X = pd.DataFrame({
        'np_bool_col': [np.bool_(True), np.bool_(False), np.bool_(True)],
        'regular_col': [np.uint8(1), np.uint16(2), np.uint8(3)],
        'np_float_col': [np.float64(1.23), np.float64(4.56), np.float64(7.89)],
        'np_array_col': [
            np.array([1, 2, 3]),  
            np.array([4, 5, 6]),    
            np.array([7, 8, 9]) ,
        ],
    })
     def _check_for_bad_pandas_dtypes(pandas_dtypes_series: pd_Series) -> None:
        bad_pandas_dtypes = [
            f"{column_name}: {pandas_dtype}"
            for column_name, pandas_dtype in pandas_dtypes_series.items()
            if not _is_allowed_numpy_dtype(pandas_dtype.type)
        ]
        if bad_pandas_dtypes:
>           raise ValueError(
                f"pandas dtypes must be int, float or bool.\nFields with bad pandas dtypes: {', '.join(bad_pandas_dtypes)}"
            )
E           ValueError: pandas dtypes must be int, float or bool.
E           Fields with bad pandas dtypes: np_array_col: object

.venv/lib/python3.14/site-packages/lightgbm/basic.py:791: ValueError

I also checked _json_default_with_numpy by manually inputting a self.pandas_categorical that actually invokes the function.

    categorical_json = json.dumps(        
        {
            "feature_index": 0,
            "test_np_bool": np.bool_(True),
            "test_np_int64": np.int64(42), 
            "test_np_array": np.array([1,2,3])
        }, 
        default=_json_default_with_numpy,
    )

which results in a model dump json pandas categorical key value pair
pandas_categorical:{"feature_index": 0, "test_np_bool": true, "test_np_int64": 42, "test_np_array": [1, 2, 3]}

However, as explained before it is not possible for self.pandas_categorical to have a value like this.

If I am wrong please explain to me what I am missing 😃

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the investigation!

I'll need to take a little time to read through what you've shared here. The expectations around Dataset.pandas_categorical are a little unclear in the codebase, I'll try to improve that.

I can share that I looked through the git blame tonight and it seems this _json_default_with_numpy() function has been in lightgbm for 9 years (#247), and its addition didn't generate any discussion about why it was necessary. @wxchan added this but isn't active in LightGBM or on GitHub any more, so I don't think they'll be able to help us understand it.

I'll look at this shortly. Two other notes while I do that:

  1. please do update your git config so your commits will be tied to your GitHub account (#7143 (comment))
  2. in the future, share code links as raw links instead of wrapped in markdown like [here](link), so they'll be rendered directly in the GitHub UI like this:

def _json_default_with_numpy(obj: Any) -> Any:
"""Convert numpy classes to JSON serializable objects."""
if isinstance(obj, (np.integer, np.floating, np.bool_)):
return obj.item()
elif isinstance(obj, np.ndarray):
return obj.tolist()
else:
return obj

@daguirre11 daguirre11 closed this Feb 1, 2026
@daguirre11 daguirre11 reopened this Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants