Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression
- URL: http://arxiv.org/abs/2106.01682v2
- Date: Sun, 6 Jun 2021 15:26:42 GMT
- Title: Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression
- Authors: Olivier Sprangers, Sebastian Schelter, Maarten de Rijke
- Abstract summary: Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
- Score: 51.770998056563094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient Boosting Machines (GBM) are hugely popular for solving tabular data
problems. However, practitioners are not only interested in point predictions,
but also in probabilistic predictions in order to quantify the uncertainty of
the predictions. Creating such probabilistic predictions is difficult with
existing GBM-based solutions: they either require training multiple models or
they become too computationally expensive to be useful for large-scale
settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method
to create probabilistic predictions with a single ensemble of decision trees in
a computationally efficient manner. PGBM approximates the leaf weights in a
decision tree as a random variable, and approximates the mean and variance of
each sample in a dataset via stochastic tree ensemble update equations. These
learned moments allow us to subsequently sample from a specified distribution
after training. We empirically demonstrate the advantages of PGBM compared to
existing state-of-the-art methods: (i) PGBM enables probabilistic estimates
without compromising on point performance in a single model, (ii) PGBM learns
probabilistic estimates via a single model only (and without requiring
multi-parameter boosting), and thereby offers a speedup of up to several orders
of magnitude over existing state-of-the-art methods on large datasets, and
(iii) PGBM achieves accurate probabilistic estimates in tasks with complex
differentiable loss functions, such as hierarchical time series problems, where
we observed up to 10% improvement in point forecasting performance and up to
300% improvement in probabilistic forecasting performance.
Related papers
- Probabilistic Scores of Classifiers, Calibration is not Enough [0.32985979395737786]
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications.
In this study, we highlight approaches that prioritize the alignment between predicted scores and true probability distributions.
Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
arXiv Detail & Related papers (2024-08-06T19:53:00Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2023-10-17T20:30:16Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2022-06-16T06:13:53Z) - Instance-Based Uncertainty Estimation for Gradient-Boosted Regression
Trees [13.109852233032395]
We propose Instance-Based Uncertainty estimation for Gradient-boosted regression trees(Ibug)
Ibug computes a non-parametric distribution around a prediction using the k-nearest training instances, where distance is measured with a tree-ensemble kernel.
We find that Ibug achieves similar or better performance than the previous state-of-the-art across 22 benchmark regression datasets.
arXiv Detail & Related papers (2022-05-23T15:53:27Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - A Worrying Analysis of Probabilistic Time-series Models for Sales
Forecasting [10.690379201437015]
Probabilistic time-series models become popular in the forecasting field as they help to make optimal decisions under uncertainty.
We analyze the performance of three prominent probabilistic time-series models for sales forecasting.
arXiv Detail & Related papers (2020-11-21T03:31:23Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.