Related papers: Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees

Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees

URL: http://arxiv.org/abs/2205.11412v1
Date: Mon, 23 May 2022 15:53:27 GMT
Title: Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
Authors: Jonathan Brophy and Daniel Lowd
Abstract summary: We propose Instance-Based Uncertainty estimation for Gradient-boosted regression trees(Ibug) Ibug computes a non-parametric distribution around a prediction using the k-nearest training instances, where distance is measured with a tree-ensemble kernel. We find that Ibug achieves similar or better performance than the previous state-of-the-art across 22 benchmark regression datasets.
Score: 13.109852233032395
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We propose Instance-Based Uncertainty estimation for Gradient-boosted regression trees~(IBUG), a simple method for extending any GBRT point predictor to produce probabilistic predictions. IBUG computes a non-parametric distribution around a prediction using the k-nearest training instances, where distance is measured with a tree-ensemble kernel. The runtime of IBUG depends on the number of training examples at each leaf in the ensemble, and can be improved by sampling trees or training instances. Empirically, we find that IBUG achieves similar or better performance than the previous state-of-the-art across 22 benchmark regression datasets. We also find that IBUG can achieve improved probabilistic performance by using different base GBRT models, and can more flexibly model the posterior distribution of a prediction than competing methods. We also find that previous methods suffer from poor probabilistic calibration on some datasets, which can be mitigated using a scalar factor tuned on the validation data.

Related papers

A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables. We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z)
Variational Inference with Coverage Guarantees in Simulation-Based Inference [18.818573945984873]
We propose Conformalized Amortized Neural Variational Inference (CANVI) CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation.
arXiv Detail & Related papers (2023-05-23T17:24:04Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes [1.03590082373586]
The GP-BART model is an extension of BART which addresses the limitation by assuming GP priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modeling approaches in various scenarios.
arXiv Detail & Related papers (2022-04-05T11:18:44Z)
Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM. We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z)
Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z)
Scalable Cross Validation Losses for Gaussian Process Models [22.204619587725208]
We use Polya-Gamma auxiliary variables and variational inference to accommodate binary and multi-class classification. We find that our method offers fast training and excellent predictive performance.
arXiv Detail & Related papers (2021-05-24T21:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.