Lassoed Tree Boosting
- URL: http://arxiv.org/abs/2205.10697v6
- Date: Fri, 8 Dec 2023 19:39:57 GMT
- Title: Lassoed Tree Boosting
- Authors: Alejandro Schuler, Yi Li, Mark van der Laan
- Abstract summary: We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
- Score: 53.56229983630983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient boosting performs exceptionally in most prediction problems and
scales well to large datasets. In this paper we prove that a ``lassoed''
gradient boosted tree algorithm with early stopping achieves faster than
$n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions
of bounded sectional variation. This rate is remarkable because it does not
depend on the dimension, sparsity, or smoothness. We use simulation and real
data to confirm our theory and demonstrate empirical performance and
scalability on par with standard boosting. Our convergence proofs are based on
a novel, general theorem on early stopping with empirical loss minimizers of
nested Donsker classes.
Related papers
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Min-Max Optimization under Delays [26.830212508878162]
Delays and asynchrony are inevitable in large-scale machine-learning problems.
No analogous theory is available for min-max optimization.
We show that even small delays can cause prominent algorithms like Extra-gradient to diverge.
arXiv Detail & Related papers (2023-07-13T16:39:01Z) - SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree
Search [68.66904039405871]
We introduce SoftTreeMax, a generalization of softmax that takes planning into account.
We show for the first time the role of a tree expansion policy in mitigating this variance.
Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
arXiv Detail & Related papers (2023-01-30T19:03:14Z) - A Robust Hypothesis Test for Tree Ensemble Pruning [2.4923006485141284]
We develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles.
We show that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss.
We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.
arXiv Detail & Related papers (2023-01-24T16:31:49Z) - SkipNode: On Alleviating Performance Degradation for Deep Graph
Convolutional Networks [84.30721808557871]
We conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs.
We propose a simple yet effective plug-and-play module, Skipnode, to overcome the performance degradation of deep GCNs.
arXiv Detail & Related papers (2021-12-22T02:18:31Z) - GBHT: Gradient Boosting Histogram Transform for Density Estimation [73.94900378709023]
We propose a density estimation algorithm called textitGradient Boosting Histogram Transform (GBHT)
We make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems.
arXiv Detail & Related papers (2021-06-10T13:40:28Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Linear Last-iterate Convergence in Constrained Saddle-point Optimization [48.44657553192801]
We significantly expand the understanding of last-rate uniqueness for Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU)
We show that when the equilibrium is unique, linear lastiterate convergence is achieved with a learning rate whose value is set to a universal constant.
We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption.
arXiv Detail & Related papers (2020-06-16T20:53:04Z) - Random Reshuffling: Simple Analysis with Vast Improvements [9.169947558498535]
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative descent steps conjunction in with data reshuffuffling.
arXiv Detail & Related papers (2020-06-10T17:57:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.