Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Apr 16, 2022

Bumps xgboost from 1.3.3 to 1.6.0.

Release notes

Sourced from xgboost's releases.

Release 1.6.0 stable

v1.6.0 (2022 Apr 16)

After a long period of development, XGBoost v1.6.0 is packed with many new features and improvements. We summarize them in the following sections starting with an introduction to some major new features, then moving on to language binding specific changes including both new features and notable bug fixes for each package.

Development of categorical data support

This version of XGBoost features new improvements and full coverage of experimental categorical data support in Python and C package with tree model. Both hist, approx and gpu_hist now supports training with categorical data. Also, partition-based categorical split is featured in this release. This split type is first available in LightGBM in the context of gradient boosting. In the previous version, only gpu_hist supports one-hot encoding-based split which has the form of x \in {c} where {c} is the set of all categories. In this new release, the {c} can be optionally split into 2 sets for the left and right nodes using any of the aforementioned tree methods. For more information, please see our tutorial on categorical data, along with examples linked on that page. (#7380, #7708, #7695, #7330, #7307, #7322, #7705, #7652, #7592, #7666, #7576, #7569, #7529, #7575, #7393, #7465, #7385, #7371, #7745, #7810)

In the future, we will continue to improve categorical data support with new features and optimizations. Also, we are looking forward to bringing the feature beyond Python binding, contributions and feedback are welcomed! Lastly, as a result of experimental status, the behavior might be subject to change, especially the default value of related hyper-parameters.

Experimental support for multi-output model

XGBoost 1.6 features initial support for the multi-output model, which includes multi-output regression and multi-label classification. Along with this, the XGBoost classifier has proper support for base margin without to need for the user to flatten the input. In this initial support, XGBoost builds one model for each target similar to the sklearn meta estimator, for more details, please see our quick introduction. (#7365, #7736, #7607, #7574, #7521, #7514, #7456, #7453, #7455, #7434, #7429, #7405, #7381)

External memory support

External memory support for both approx and hist tree method is considered feature complete in XGBoost 1.6. Building upon the iterator-based interface introduced in the previous version, now both hist and approx iterates over each batch of data during training and prediction. In previous versions, hist concatenates all the batches into an internal representation, which is removed in this version. As a result, users can expect higher scalability in terms of data size but might experience lower performance due to disk IO. (#7531, #7320, #7638, #7372)

Rewritten approx

The approx tree method is rewritten based on the existing hist tree method, the rewrite closes the feature gap between approx and hist and improves the performance. Now the behavior and approx should be more aligned with hist and gpu_hist, here's a list of user-visible changes:

  • Supports both max_leaves and max_depth.
  • Supports grow_policy.
  • Supports monotonic constraint.
  • Supports feature weights.
  • Use max_bin to replace sketch_eps.
  • Supports categorical data.
  • Faster performance for many of the datasets.
  • Improved performance and robustness for distributed training.
  • Supports prediction cache.
  • Significantly better performance for external memory when depthwise policy is used.

New serialization format

Based on the existing JSON serialization format, we introduce UBJSON support as a more efficient alternative. Both formats will be available in the future and we plan to gradually phase out support for the old binary model format. Users can opt to use the different formats in the serialization function by providing the file extension json or ubj. Also, the save_raw function in all supported languages bindings gains a new parameter for exporting the model in different formats, available options are json, ubj, and deprecated, see document for the language binding you are using for details. Lastly, the default internal serialization format is set to UBJSON, which affects Python pickle and R RDS. (#7572, #7570, #7358, #7571, #7556, #7549, #7416)

General new features and improvements

Aside from the major new features mentioned above, some others are summarized here:

  • Users can now access the build information of XGBoost binary in Python and C interface. (#7399, #7553)
  • Auto-configuration of seed_per_iteration is removed, now distributed training should generate closer results to single node training when sampling is used. (#7009)
  • A new parameter huber_slope is introduced for the Pseudo-Huber objective.
  • During source build, XGBoost can choose cub in the system path automatically. (#7579)
  • XGBoost now honors the CPU counts from CFS, which is usually set in docker environments. (#7654, #7704)
  • The metric aucpr is rewritten for better performance and GPU support. (#7297, #7368)
  • Metric calculation is now performed in double precision. (#7364)
  • XGBoost no longer mutates the global OpenMP thread limit. (#7537, #7519, #7608, #7590, #7589, #7588, #7687)
  • The default behavior of max_leave and max_depth is now unified (#7302, #7551).
  • CUDA fat binary is now compressed. (#7601)
  • Deterministic result for evaluation metric and linear model. In previous versions of XGBoost, evaluation results might differ slightly for each run due to parallel reduction for floating-point values, which is now addressed. (#7362, #7303, #7316, #7349)
  • XGBoost now uses double for GPU Hist node sum, which improves the accuracy of gpu_hist. (#7507)

... (truncated)

Changelog

Sourced from xgboost's changelog.

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.5.0 (2021 Oct 11)

This release comes with many exciting new features and optimizations, along with some bug fixes. We will describe the experimental categorical data support and the external memory interface independently. Package-specific new features will be listed in respective sections.

Development on categorical data support

In version 1.3, XGBoost introduced an experimental feature for handling categorical data natively, without one-hot encoding. XGBoost can fit categorical splits in decision trees. (Currently, the generated splits will be of form x \in {v}, where the input is compared to a single category value. A future version of XGBoost will generate splits that compare the input against a list of multiple category values.)

Most of the other features, including prediction, SHAP value computation, feature importance, and model plotting were revised to natively handle categorical splits. Also, all Python interfaces including native interface with and without quantized DMatrix, scikit-learn interface, and Dask interface now accept categorical data with a wide range of data structures support including numpy/cupy array and cuDF/pandas/modin dataframe. In practice, the following are required for enabling categorical data support during training:

  • Use Python package.
  • Use gpu_hist to train the model.
  • Use JSON model file format for saving the model.

Once the model is trained, it can be used with most of the features that are available on the Python package. For a quick introduction, see https://xgboost.readthedocs.io/en/latest/tutorials/categorical.html

Related PRs: (#7011, #7001, #7042, #7041, #7047, #7043, #7036, #7054, #7053, #7065, #7213, #7228, #7220, #7221, #7231, #7306)

  • Next steps

    • Revise the CPU training algorithm to handle categorical data natively and generate categorical splits
    • Extend the CPU and GPU algorithms to generate categorical splits of form x \in S where the input is compared with multiple category values. split. (#7081)

External memory

This release features a brand-new interface and implementation for external memory (also known as out-of-core training). (#6901, #7064, #7088, #7089, #7087, #7092, #7070, #7216). The new implementation leverages the data iterator interface, which is currently used to create DeviceQuantileDMatrix. For a quick introduction, see https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html#data-iterator . During the development of this new interface, lz4 compression is removed. (#7076).

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Apr 16, 2022
Bumps [xgboost](https://github.com/dmlc/xgboost) from 1.3.3 to 1.6.0.
- [Release notes](https://github.com/dmlc/xgboost/releases)
- [Changelog](https://github.com/dmlc/xgboost/blob/master/NEWS.md)
- [Commits](dmlc/xgboost@v1.3.3...v1.6.0)

---
updated-dependencies:
- dependency-name: xgboost
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/python/requirements/ml/xgboost-1.6.0 branch from 4ef6b93 to 45540b8 Compare May 2, 2022 19:06
@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github May 14, 2022

Superseded by #553.

@dependabot dependabot bot closed this May 14, 2022
@dependabot dependabot bot deleted the dependabot/pip/python/requirements/ml/xgboost-1.6.0 branch May 14, 2022 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant