Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles
- URL: http://arxiv.org/abs/2402.15691v1
- Date: Sat, 24 Feb 2024 02:29:10 GMT
- Title: Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles
- Authors: Fan Yang, Pierre Le Bodic, Michael Kamp, Mario Boley
- Abstract summary: Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models.
We show how a new objective function measures the angle between the risk gradient vector and the projection of the condition output vector onto the complement of the already selected conditions.
This approach correctly approximates the ideal update of adding the risk gradient itself to the model and favours the inclusion of more general and thus shorter rules.
- Score: 10.40809014729148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient boosting of prediction rules is an efficient approach to learn
potentially interpretable yet accurate probabilistic models. However, actual
interpretability requires to limit the number and size of the generated rules,
and existing boosting variants are not designed for this purpose. Though
corrective boosting refits all rule weights in each iteration to minimise
prediction risk, the included rule conditions tend to be sub-optimal, because
commonly used objective functions fail to anticipate this refitting. Here, we
address this issue by a new objective function that measures the angle between
the risk gradient vector and the projection of the condition output vector onto
the orthogonal complement of the already selected conditions. This approach
correctly approximate the ideal update of adding the risk gradient itself to
the model and favours the inclusion of more general and thus shorter rules. As
we demonstrate using a wide range of prediction tasks, this significantly
improves the comprehensibility/accuracy trade-off of the fitted ensemble.
Additionally, we show how objective values for related rule conditions can be
computed incrementally to avoid any substantial computational overhead of the
new method.
Related papers
- Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs.
We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee.
This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Adaptive proximal algorithms for convex optimization under local
Lipschitz continuity of the gradient [4.478941279527423]
Backtracking linesearch is the de facto approach for minimizing continuously differentiable functions with locally Lipschitz gradient.
In recent years, it has been shown that in the convex setting it is possible to avoid linesearch altogether.
We propose an adaptive proximal gradient method, adaPG, that uses novel estimates of the local smoothness modulus.
arXiv Detail & Related papers (2023-01-11T12:19:28Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Root-finding Approaches for Computing Conformal Prediction Set [18.405645120971496]
Conformal prediction constructs a confidence region for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations.
We exploit the fact that, emphoften, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding software.
arXiv Detail & Related papers (2021-04-14T06:41:12Z) - Better Short than Greedy: Interpretable Models through Optimal Rule
Boosting [10.938624307941197]
Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability.
We present a novel approach aiming to fit rule ensembles of maximal predictive power for a given ensemble size.
arXiv Detail & Related papers (2021-01-21T01:03:48Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.