Statistical Inference for Gradient Boosting Regression
- URL: http://arxiv.org/abs/2509.23127v1
- Date: Sat, 27 Sep 2025 05:16:10 GMT
- Title: Statistical Inference for Gradient Boosting Regression
- Authors: Haimo Fang, Kevin Tan, Giles Hooker,
- Abstract summary: We propose a unified framework for statistical gradient inference in gradient boosting regression.<n>Our framework integrates dropout or parallel training with a recently proposed regularization procedure that allows for a central limit theorem (CLT) for boosting.<n>Our algorithms enjoy similar CLTs, which we use to construct built-in confidence intervals, prediction intervals, and rigorous hypothesis tests for assessing variable importance.
- Score: 5.475047189434392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprisingly find that increasing the dropout rate and the number of trees grown in parallel at each iteration substantially enhances signal recovery and overall performance. Our resulting algorithms enjoy similar CLTs, which we use to construct built-in confidence intervals, prediction intervals, and rigorous hypothesis tests for assessing variable importance. Numerical experiments demonstrate that our algorithms perform well, interpolate between regularized boosting and random forests, and confirm the validity of their built-in statistical inference procedures.
Related papers
- On the Convergence of Multicalibration Gradient Boosting [13.103291011255202]
We bridge the gap by providing convergence guarantees for multicalibration gradient boosting in regression with squared-error loss.<n>We show that the magnitude of successive prediction updates decays at $O(1/sqrtT)$, which implies the same convergence rate bound for the multicalibration error over rounds.
arXiv Detail & Related papers (2026-02-06T15:29:02Z) - Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts [100.26854618129039]
Weather forecasting is fundamentally challenged by the chaotic nature of the atmosphere.<n>Recent advances in Bayesian Deep Learning (BDL) offer a promising but often disconnected alternative.<n>We bridge these paradigms through a unified hybrid BDL framework for ensemble weather forecasting.
arXiv Detail & Related papers (2025-11-18T07:49:52Z) - CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs [53.749193998004166]
Curriculum learning plays a crucial role in enhancing the training efficiency of large language models.<n>We propose CurES, an efficient training method that accelerates convergence and employs Bayesian posterior estimation to minimize computational overhead.
arXiv Detail & Related papers (2025-10-01T15:41:27Z) - Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis [7.2620484413601325]
We analyze the convergence behavior of gradient descent with momentum (SGDM) under dynamic learning-rate and batch-size schedules.<n>We extend the existing theoretical framework to cover three practical scheduling strategies commonly used in deep learning.<n>Our results reveal a clear hierarchy in convergence: a constant batch size does not guarantee convergence of the expected norm, whereas an increasing batch size does, and simultaneously increasing both the batch size and learning rate achieves a provably faster decay.
arXiv Detail & Related papers (2025-08-05T05:32:36Z) - Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings [24.07815507403025]
Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare.<n>We analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2025-06-03T12:16:46Z) - RieszBoost: Gradient Boosting for Riesz Regression [49.737777802061984]
We propose a novel gradient boosting algorithm to directly estimate the Riesz representer without requiring its explicit analytical form.<n>We show that our algorithm performs on par with or better than indirect estimation techniques across a range of functionals.
arXiv Detail & Related papers (2025-01-08T23:04:32Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We investigate the statistical properties of Temporal Difference learning with Polyak-Ruppert averaging.<n>We make three significant contributions that improve the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Max-Rank: Efficient Multiple Testing for Conformal Prediction [43.56898111853698]
Multiple hypothesis testing (MHT) frequently arises in scientific inquiries, and concurrent testing of multiple hypotheses inflates the risk of Type-I errors or false positives.<n>This paper addresses MHT in the context of conformal prediction, a flexible framework for predictive uncertainty quantification.<n>We introduce $textttmax-rank$, a novel correction that exploits dependencies whilst efficiently controlling the family-wise error rate.
arXiv Detail & Related papers (2023-11-17T22:44:22Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - Uncertainty in Gradient Boosting via Ensembles [37.808845398471874]
ensembles of gradient boosting models successfully detect anomalous inputs while having limited ability to improve the predicted total uncertainty.
We propose a concept of a virtual ensemble to get the benefits of an ensemble via only one gradient boosting model, which significantly reduces complexity.
arXiv Detail & Related papers (2020-06-18T14:11:27Z) - Robust Boosting for Regression Problems [0.0]
Gradient boosting algorithms construct a regression predictor using a linear combination of base learners''
The robust boosting algorithm is based on a two-stage approach, similar to boosting is done for robust linear regression.
When no atypical observations are present, the robust boosting approach works as well as a standard gradient boosting with a squared loss.
arXiv Detail & Related papers (2020-02-06T01:12:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.