Related papers: Lassoed Tree Boosting

Lassoed Tree Boosting

URL: http://arxiv.org/abs/2205.10697v6
Date: Fri, 8 Dec 2023 19:39:57 GMT
Title: Lassoed Tree Boosting
Authors: Alejandro Schuler, Yi Li, Mark van der Laan
Abstract summary: We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
Score: 53.56229983630983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stopping achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, sparsity, or smoothness. We use simulation and real data to confirm our theory and demonstrate empirical performance and scalability on par with standard boosting. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.

Related papers

Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Min-Max Optimization under Delays [26.830212508878162]
Delays and asynchrony are inevitable in large-scale machine-learning problems. No analogous theory is available for min-max optimization. We show that even small delays can cause prominent algorithms like Extra-gradient to diverge.
arXiv Detail & Related papers (2023-07-13T16:39:01Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions. We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z)
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search [68.66904039405871]
We introduce SoftTreeMax, a generalization of softmax that takes planning into account. We show for the first time the role of a tree expansion policy in mitigating this variance. Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
A Robust Hypothesis Test for Tree Ensemble Pruning [2.4923006485141284]
We develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles. We show that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss. We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.
arXiv Detail & Related papers (2023-01-24T16:31:49Z)
SkipNode: On Alleviating Performance Degradation for Deep Graph Convolutional Networks [84.30721808557871]
We conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs. We propose a simple yet effective plug-and-play module, Skipnode, to overcome the performance degradation of deep GCNs.
arXiv Detail & Related papers (2021-12-22T02:18:31Z)
GBHT: Gradient Boosting Histogram Transform for Density Estimation [73.94900378709023]
We propose a density estimation algorithm called textitGradient Boosting Histogram Transform (GBHT) We make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems.
arXiv Detail & Related papers (2021-06-10T13:40:28Z)
Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z)
Linear Last-iterate Convergence in Constrained Saddle-point Optimization [48.44657553192801]
We significantly expand the understanding of last-rate uniqueness for Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU) We show that when the equilibrium is unique, linear lastiterate convergence is achieved with a learning rate whose value is set to a universal constant. We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption.
arXiv Detail & Related papers (2020-06-16T20:53:04Z)
Random Reshuffling: Simple Analysis with Vast Improvements [9.169947558498535]
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative descent steps conjunction in with data reshuffuffling.
arXiv Detail & Related papers (2020-06-10T17:57:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.