Skip to content

Releases: google/yggdrasil-decision-forests

Python API 0.13.0

15 Jul 16:05

Choose a tag to compare

0.13.0 - 2025-07-15

API Changes

  • For Random Forest models, .out_of_bag_evaluations() now returns a
    TrainingLogs object. The content is identical to the object previously
    returned, but the number_of_trees property has been renamed to
    iteration for consistency with Gradient Boosted Trees Training Logs.
  • mode="tf" is now the default on model.to_tensorflow_saved_model(). The
    previous default is still available by setting mode="keras".
  • model.label() returns None for models trained without a label.
  • Remove deprecated evaluation_task argument for model.evaluate(). Use
    task instead.

Feature

  • Add standalone C++ export with model.to_standalone_cc(). Standalone models
    are super flexible, fast and memory-efficient. They only depend on the C++
    standard library.
  • Add model.training_logs() method to return the training logs of the model.
  • Expose Mean Average Precision for Ranking tasks.
  • Add hyperparameters
    numerical_vector_sequence_enable_closer_than_conditions and
    numerical_vector_sequence_enable_projected_more_than_conditions.
  • Clear error messages when attempting to evaluate models without label.
  • Faster training with sparse oblique splits for datasets with many numerical
    features
  • Many documentation improvements.
  • Increase default number of threads to 256 or number of CPU cores.
  • Enable cross-validation for hyperparameter tuning.
  • Add thresholds to classification plots.
  • Explicitly disable custom losses for hyperparameter tuning.
  • Disable parallel evaluation for cross-validation custom losses.

Fix

  • Distributed Training: recvmsg: Connection reset to isTransientError.
  • Enable SHAP values when training with BEST_FIRST_GLOBAL.
  • Predictions with cross-entropy LambdaMART no longer need the slow engine.
  • Disable the generic engine for oblique splits without global imputation.
    This may fix a very rare bug in the way predictions are computed.

Release music

Sinfonie Nr. 4 in A-Dur, op. 90. Felix Mendelssohn

Python API 0.12.0

20 May 14:11

Choose a tag to compare

0.12.0 - 2025-05-20

Feature

  • Enable support for Python 3.13.
  • Add custom fields to model metadata.
  • Add SHAP value variable importances with model.analyze().
  • Add SHAP values for a dataset with model.predict_shap().
  • Speed-up (up to 20x) training of models with CATEGORICAL_SET features.
  • Add hyper-parameter to limit the mask size for CATEGORICAL_SET features.
  • Add hyper-parameter total_max_num_nodes to limit the total number of nodes in a model.
  • Add support for na_replacements in python tree editor API.
  • Add support for include_all_columns in FeatureSelector.
  • Add the ydf.utils.LogBook to manage and track experiments.
  • Speed-up training of NDCG ranking model when a single example per group
    is non-zero.
  • Speed-up training on datasets with few columns on a computer with a
    large amount of cores.
  • Speed-up loss computation multi-threading code.
  • Improve distributed training error messages.
  • Remove need for label columns for deep learning models.

Fix

  • Log message if early stopping is not used.
  • Fix force_numerical_discretization errors and documentation.
  • Fix handling of empty list columns in the dataset.

Release music

Te Deum in D major, H.146. Marc-Antoine Charpentier

v1.11.0

12 Mar 13:45

Choose a tag to compare

1.11.0 - 2025-03-12

Features

  • Speed-up training of GBT models by ~10%.
  • Support for categorical and boolean features in Isolation Forests.
  • Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but
    can still be used.
  • Allow configuring the truncation of NDCG losses.
  • Add support for distributed training for ranking gradient boosted tree
    models.
  • Add support for NUMERICAL_VECTOR_SEQUENCE features.
  • Add support for AVRO data file using the "avro:" prefix.
  • Additional hyperparameters restricting weights of sparse oblique splits
    to integers or powers of 2.
  • Facilitate training on VertexAI.
  • Deprecated SparseObliqueSplit.binary_weights hyperparameter in favor of
    SparseObliqueSplit.weights.
  • Add Gzip-compressed BLOB_SEQUENCE serialization
  • Enable Poisson loss for model analysis and fast inference.
  • Add config for compatibility with protobuf lite.

Fix

  • Fix structural variable importances for oblique splits.
  • Deflake tests.
  • Remove CHECK/FATAL from training code.
  • Fix crash in YDF distributed training.

Misc

  • Loss options are now defined
    model/gradient_boosted_trees/gradient_boosted_trees.proto (previously
    learner/gradient_boosted_trees/gradient_boosted_trees.proto)
  • Remove C++14 support.
  • Various documentation improvements.

Python API 0.11.0

12 Mar 13:47

Choose a tag to compare

0.11.0 - 2025-03-12

Feature

  • Expose losses for distributed training.
  • Add class_weights parameter to the learners.
  • Support for Google Cloud paths for datasets and model IO.
  • Add utility to facilitate distributed training on VertexAI.
  • Improved support for non-unicode data in categorical features.
  • Add support for saving and analyzing deep models.

Fix

  • Fix incorrectly transposed confusion table in HTML.
  • Various documentation fixes.
  • Better requirements management.

Documentation

  • Add tutorial for Categorical Set features.
  • Add tutorial for training on VertexAI.

Release music

3. Sinfonie in d-Moll. Gustav Mahler

Python API 0.10.0

11 Feb 13:32

Choose a tag to compare

0.10.0 - 2025-02-11

Feature

  • Expose model.save(..., pure_serving=True) for saving a model without debug
    information.
  • Allow users to provide a training proto configuration to the learner.
  • Add vector sequence feature support.
  • Add Variable importances for Isolation Forest Models.
  • Add ydf.help.loading_data() to print information about the type of
    supported dataset formats.
  • Add experimental Tabular Transformer implementation.
  • Add gzipped blob sequence as new model format (still optional).
  • Enabled Poisson Loss for model analysis and fast inference.

Fix

  • Fix recognition of multidimensional features for Numpy arrays of type
    object.
  • Fix subsample count for small number of training examples for Isolation
    Forests.
  • Fix NUM_NODES variable importance for oblique splits.

Other

  • Updated OSS dependencies of protobuf, grpc and abseil.

Release music

  1. Sinfonie in Es-Dur "Sinfonia Eroica", op. 55. Ludwig van Beethoven

Python API 0.9.0

02 Dec 16:02

Choose a tag to compare

0.9.0 - 2024-12-02

Breaking

  • Classification Label classes are now consistently ordered lexicographically
    (for string labels) or increasingly (for integer labels).
  • Change typo partial_depepence_plot to partial_dependence_plot on
    model.analyze().

Feature

  • Add support for Avro file for path / distributed training with the "avro:"
    prefix.
  • Add support for discretized numerical features for in-memory datasets.
  • Expose MRR for ranking models.
  • Add model.predict_class to generate the most likely predicted class of
    classification models.
  • Add support for automatic feature selection with the feature_selector
    learner constructor argument. See the feature selection tutorial for
    more details.
  • Add standalone prediction evaluation ydf.evaluate_predictions().
  • Add new hyperparameter sparse_oblique_max_num_projections.
  • Add options "POWER_OF_TWO" and "INTEGER" for sparse oblique weights.
  • Emit proper errors when using lists for multi-dimensional features.

Fix

  • Regression and Ranking CEPs scaling corrected.

Release music

The John B. Sails. Traditional

Python API 0.8.0

23 Sep 16:49

Choose a tag to compare

0.8.0 - 2024-09-23

Breaking

  • Disallow positional parameters for the learners, except for label and task.
  • Remove the unsupported / invalid hyperparameters from the Isolation Forest
    learner.
  • Remove parameters for distributed training and resuming training from
    learners that do not support these capabilities.
  • By default, model.analyze for a maximum of 20 seconds (i.e.
    maximum_duration=20 by default).
  • Convert boolean values in categorical sets to lowercase, matching the
    treatment of categorical features.

Feature

  • Warn if training on a VerticalDataset and fail if attempting to modify the
    columns in a VerticalDataset during training.
  • User can override the model's task, label or group during evaluation.
  • Add num_examples_per_tree() method to Isolation Forest models.
  • Expose the slow engine for debugging predictions and evaluations with
    use_slow_engine=True.
  • Speed-up training of GBT models by ~10%.
  • Support for categorical and boolean features in Isolation Forests.
  • Add ydf.util.read_tf_record and ydf.util.write_tf_record to facilitate
    TF Record datasets usage.
  • Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but
    can still be used.
  • Allow configuring the truncation of NDCG losses.
  • Enable multi-threading when using model.predict and model.evaluate.
  • Default number of threads of model.analyze is equal to the number of
    cores.
  • Add multi-threaded results in model.benchmark.
  • Add argument to control the maximum duration of model.analyze.
  • Add support for Unicode strings, normalize categorical set values in the
    same way as categorical values, and validate their types.
  • Add support for distributed training for ranking gradient boosted tree
    models.

Fix

  • Fix labels of regression evaluation plots
  • Improved errors if Isolation Forest training fails.

Release music

Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)

v1.10.0

21 Aug 19:51

Choose a tag to compare

1.10.0 - 2024-08-21

Features

  • Add support for Isolation Forests model.
  • The default value of num_candidate_attributes in the CART learner is
    changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
    the generally accepted logic of CART.
  • Added support for GCS for file I/O.

Python API 0.7.0

21 Aug 19:47

Choose a tag to compare

Python API 0.7.0 - 2024-08-21

Feature

  • Expose validate_hyperparameters() on the learner.
  • Clarify which parameters in the learner are optional.
  • Add support in JAX FeatureEncoder for non-string categorical feature values.
  • Improve performance of Isolation Forests.
  • Models can be serialized/deserialized to/from bytes with model.serialize()
    and ydf.deserialize_model.
  • Models can be pickled safely.
  • Native support for Xarray as a dataset format for all operations (e.g.,
    training, evaluation, predictions).
  • The output of model.to_jax_function can be converted to a TensorFlow Lite
    model.
  • Change the default number of examples to scan when training on files to
    determine the semantic and dictionaries of columns from 10k to 100k.
  • Various improvements of error messages.
  • Evaluation for Anomaly Detection models.
  • Oblique splits for Anomaly Detection models.

Fix

  • Fix parsing of multidimensional ragged inputs.
  • Fix isolation forest hyperparameter defaults.
  • Fix bug causing distributed training to fail on a sharded dataset containing
    an empty shard.
  • Handle unordered categorical sets in training.
  • Fix dataspec ignoring definitions of unrolled columns, such as
    multidimensional categorical integers.
  • Fix error when defining categorical sets for non-ragged multidimensional
    inputs.
  • MacOS: Fix compatibility with other protobuf-using libraries such as
    Tensorflow.

Release music

Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen",
Op. 129. Ludwig van Beethoven

Python API 0.6.0

26 Jul 13:57

Choose a tag to compare

Feature

  • model.to_jax_function now always outputs a FeatureEncoder to help feeding
    data to the JAX model.
  • The default value of num_candidate_attributes in the CART learner is
    changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
    the generally accepted logic of CART.
  • model.to_tensorflow_saved_model support preprocessing functions which have
    a different signature than the YDF model.
  • Improve error messages when feeding wrong size Numpy arrays.
  • Add option for weighted evaluation in model.evaluate.

Fix

  • Fix display of confusion matrix with floating point weights.

Known issues

  • MacOS build is broken.