Releases: google/yggdrasil-decision-forests
Releases · google/yggdrasil-decision-forests
Python API 0.13.0
0.13.0 - 2025-07-15
API Changes
- For Random Forest models,
.out_of_bag_evaluations()now returns a
TrainingLogs object. The content is identical to the object previously
returned, but thenumber_of_treesproperty has been renamed to
iterationfor consistency with Gradient Boosted Trees Training Logs. mode="tf"is now the default onmodel.to_tensorflow_saved_model(). The
previous default is still available by settingmode="keras".model.label()returns None for models trained without a label.- Remove deprecated
evaluation_taskargument formodel.evaluate(). Use
taskinstead.
Feature
- Add standalone C++ export with
model.to_standalone_cc(). Standalone models
are super flexible, fast and memory-efficient. They only depend on the C++
standard library. - Add
model.training_logs()method to return the training logs of the model. - Expose Mean Average Precision for Ranking tasks.
- Add hyperparameters
numerical_vector_sequence_enable_closer_than_conditionsand
numerical_vector_sequence_enable_projected_more_than_conditions. - Clear error messages when attempting to evaluate models without label.
- Faster training with sparse oblique splits for datasets with many numerical
features - Many documentation improvements.
- Increase default number of threads to 256 or number of CPU cores.
- Enable cross-validation for hyperparameter tuning.
- Add thresholds to classification plots.
- Explicitly disable custom losses for hyperparameter tuning.
- Disable parallel evaluation for cross-validation custom losses.
Fix
- Distributed Training:
recvmsg: Connection reset to isTransientError. - Enable SHAP values when training with BEST_FIRST_GLOBAL.
- Predictions with cross-entropy LambdaMART no longer need the slow engine.
- Disable the generic engine for oblique splits without global imputation.
This may fix a very rare bug in the way predictions are computed.
Release music
Sinfonie Nr. 4 in A-Dur, op. 90. Felix Mendelssohn
Python API 0.12.0
0.12.0 - 2025-05-20
Feature
- Enable support for Python 3.13.
- Add custom fields to model metadata.
- Add SHAP value variable importances with
model.analyze(). - Add SHAP values for a dataset with
model.predict_shap(). - Speed-up (up to 20x) training of models with CATEGORICAL_SET features.
- Add hyper-parameter to limit the mask size for CATEGORICAL_SET features.
- Add hyper-parameter
total_max_num_nodesto limit the total number of nodes in a model. - Add support for na_replacements in python tree editor API.
- Add support for include_all_columns in FeatureSelector.
- Add the
ydf.utils.LogBookto manage and track experiments. - Speed-up training of NDCG ranking model when a single example per group
is non-zero. - Speed-up training on datasets with few columns on a computer with a
large amount of cores. - Speed-up loss computation multi-threading code.
- Improve distributed training error messages.
- Remove need for label columns for deep learning models.
Fix
- Log message if early stopping is not used.
- Fix force_numerical_discretization errors and documentation.
- Fix handling of empty list columns in the dataset.
Release music
Te Deum in D major, H.146. Marc-Antoine Charpentier
v1.11.0
1.11.0 - 2025-03-12
Features
- Speed-up training of GBT models by ~10%.
- Support for categorical and boolean features in Isolation Forests.
- Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but
can still be used. - Allow configuring the truncation of NDCG losses.
- Add support for distributed training for ranking gradient boosted tree
models. - Add support for NUMERICAL_VECTOR_SEQUENCE features.
- Add support for AVRO data file using the "avro:" prefix.
- Additional hyperparameters restricting weights of sparse oblique splits
to integers or powers of 2. - Facilitate training on VertexAI.
- Deprecated
SparseObliqueSplit.binary_weightshyperparameter in favor of
SparseObliqueSplit.weights. - Add Gzip-compressed BLOB_SEQUENCE serialization
- Enable Poisson loss for model analysis and fast inference.
- Add config for compatibility with protobuf lite.
Fix
- Fix structural variable importances for oblique splits.
- Deflake tests.
- Remove CHECK/FATAL from training code.
- Fix crash in YDF distributed training.
Misc
- Loss options are now defined
model/gradient_boosted_trees/gradient_boosted_trees.proto (previously
learner/gradient_boosted_trees/gradient_boosted_trees.proto) - Remove C++14 support.
- Various documentation improvements.
Python API 0.11.0
0.11.0 - 2025-03-12
Feature
- Expose losses for distributed training.
- Add
class_weightsparameter to the learners. - Support for Google Cloud paths for datasets and model IO.
- Add utility to facilitate distributed training on VertexAI.
- Improved support for non-unicode data in categorical features.
- Add support for saving and analyzing deep models.
Fix
- Fix incorrectly transposed confusion table in HTML.
- Various documentation fixes.
- Better requirements management.
Documentation
- Add tutorial for Categorical Set features.
- Add tutorial for training on VertexAI.
Release music
3. Sinfonie in d-Moll. Gustav Mahler
Python API 0.10.0
0.10.0 - 2025-02-11
Feature
- Expose
model.save(..., pure_serving=True)for saving a model without debug
information. - Allow users to provide a training proto configuration to the learner.
- Add vector sequence feature support.
- Add Variable importances for Isolation Forest Models.
- Add
ydf.help.loading_data()to print information about the type of
supported dataset formats. - Add experimental Tabular Transformer implementation.
- Add gzipped blob sequence as new model format (still optional).
- Enabled Poisson Loss for model analysis and fast inference.
Fix
- Fix recognition of multidimensional features for Numpy arrays of type
object. - Fix subsample count for small number of training examples for Isolation
Forests. - Fix NUM_NODES variable importance for oblique splits.
Other
- Updated OSS dependencies of protobuf, grpc and abseil.
Release music
- Sinfonie in Es-Dur "Sinfonia Eroica", op. 55. Ludwig van Beethoven
Python API 0.9.0
0.9.0 - 2024-12-02
Breaking
- Classification Label classes are now consistently ordered lexicographically
(for string labels) or increasingly (for integer labels). - Change typo partial_depepence_plot to partial_dependence_plot on
model.analyze().
Feature
- Add support for Avro file for path / distributed training with the "avro:"
prefix. - Add support for discretized numerical features for in-memory datasets.
- Expose MRR for ranking models.
- Add
model.predict_classto generate the most likely predicted class of
classification models. - Add support for automatic feature selection with the
feature_selector
learner constructor argument. See the feature selection tutorial for
more details. - Add standalone prediction evaluation
ydf.evaluate_predictions(). - Add new hyperparameter
sparse_oblique_max_num_projections. - Add options "POWER_OF_TWO" and "INTEGER" for sparse oblique weights.
- Emit proper errors when using lists for multi-dimensional features.
Fix
- Regression and Ranking CEPs scaling corrected.
Release music
The John B. Sails. Traditional
Python API 0.8.0
0.8.0 - 2024-09-23
Breaking
- Disallow positional parameters for the learners, except for label and task.
- Remove the unsupported / invalid hyperparameters from the Isolation Forest
learner. - Remove parameters for distributed training and resuming training from
learners that do not support these capabilities. - By default,
model.analyzefor a maximum of 20 seconds (i.e.
maximum_duration=20by default). - Convert boolean values in categorical sets to lowercase, matching the
treatment of categorical features.
Feature
- Warn if training on a VerticalDataset and fail if attempting to modify the
columns in a VerticalDataset during training. - User can override the model's task, label or group during evaluation.
- Add
num_examples_per_tree()method to Isolation Forest models. - Expose the slow engine for debugging predictions and evaluations with
use_slow_engine=True. - Speed-up training of GBT models by ~10%.
- Support for categorical and boolean features in Isolation Forests.
- Add
ydf.util.read_tf_recordandydf.util.write_tf_recordto facilitate
TF Record datasets usage. - Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but
can still be used. - Allow configuring the truncation of NDCG losses.
- Enable multi-threading when using
model.predictandmodel.evaluate. - Default number of threads of
model.analyzeis equal to the number of
cores. - Add multi-threaded results in
model.benchmark. - Add argument to control the maximum duration of
model.analyze. - Add support for Unicode strings, normalize categorical set values in the
same way as categorical values, and validate their types. - Add support for distributed training for ranking gradient boosted tree
models.
Fix
- Fix labels of regression evaluation plots
- Improved errors if Isolation Forest training fails.
Release music
Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)
v1.10.0
1.10.0 - 2024-08-21
Features
- Add support for Isolation Forests model.
- The default value of
num_candidate_attributesin the CART learner is
changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
the generally accepted logic of CART. - Added support for GCS for file I/O.
Python API 0.7.0
Python API 0.7.0 - 2024-08-21
Feature
- Expose
validate_hyperparameters()on the learner. - Clarify which parameters in the learner are optional.
- Add support in JAX FeatureEncoder for non-string categorical feature values.
- Improve performance of Isolation Forests.
- Models can be serialized/deserialized to/from bytes with
model.serialize()
andydf.deserialize_model. - Models can be pickled safely.
- Native support for Xarray as a dataset format for all operations (e.g.,
training, evaluation, predictions). - The output of
model.to_jax_functioncan be converted to a TensorFlow Lite
model. - Change the default number of examples to scan when training on files to
determine the semantic and dictionaries of columns from 10k to 100k. - Various improvements of error messages.
- Evaluation for Anomaly Detection models.
- Oblique splits for Anomaly Detection models.
Fix
- Fix parsing of multidimensional ragged inputs.
- Fix isolation forest hyperparameter defaults.
- Fix bug causing distributed training to fail on a sharded dataset containing
an empty shard. - Handle unordered categorical sets in training.
- Fix dataspec ignoring definitions of unrolled columns, such as
multidimensional categorical integers. - Fix error when defining categorical sets for non-ragged multidimensional
inputs. - MacOS: Fix compatibility with other protobuf-using libraries such as
Tensorflow.
Release music
Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen",
Op. 129. Ludwig van Beethoven
Python API 0.6.0
Feature
model.to_jax_functionnow always outputs a FeatureEncoder to help feeding
data to the JAX model.- The default value of
num_candidate_attributesin the CART learner is
changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
the generally accepted logic of CART. model.to_tensorflow_saved_modelsupport preprocessing functions which have
a different signature than the YDF model.- Improve error messages when feeding wrong size Numpy arrays.
- Add option for weighted evaluation in
model.evaluate.
Fix
- Fix display of confusion matrix with floating point weights.
Known issues
- MacOS build is broken.