Infinitesimal gradient boosting
- URL: http://arxiv.org/abs/2104.13208v1
- Date: Mon, 26 Apr 2021 15:09:05 GMT
- Title: Infinitesimal gradient boosting
- Authors: Cl\'ement Dombry and Jean-Jil Duchamps
- Abstract summary: We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning.
We introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We define infinitesimal gradient boosting as a limit of the popular
tree-based gradient boosting algorithm from machine learning. The limit is
considered in the vanishing-learning-rate asymptotic, that is when the learning
rate tends to zero and the number of gradient trees is rescaled accordingly.
For this purpose, we introduce a new class of randomized regression trees
bridging totally randomized trees and Extra Trees and using a softmax
distribution for binary splitting. Our main result is the convergence of the
associated stochastic algorithm and the characterization of the limiting
procedure as the unique solution of a nonlinear ordinary differential equation
in a infinite dimensional function space. Infinitesimal gradient boosting
defines a smooth path in the space of continuous functions along which the
training error decreases, the residuals remain centered and the total variation
is well controlled.
Related papers
- Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.
We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z) - An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Gradient is All You Need? [0.0]
In this paper we provide a novel analytical perspective on the theoretical understanding of learning algorithms by interpreting consensus-based gradient-based optimization (CBO)
Our results prove the intrinsic power of CBO to alleviate the complexities of the nonlocal landscape function.
arXiv Detail & Related papers (2023-06-16T11:30:55Z) - MinMax Networks [0.0]
This paper describes an alternative discrete MinMax learning approach for continuous piece-wise linear functions.
We show that the convergence rate of a constrained piece-wise linear function learning is equivalent to the exponential convergence rates of the individual local linear regions.
arXiv Detail & Related papers (2023-06-15T16:30:33Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - A large sample theory for infinitesimal gradient boosting [0.0]
We consider the behavior of the algorithm of the model in the large sample limit and prove its convergence to a deterministic process.
We explore some properties of this population limit: we prove that the dynamics makes the test error decrease and we consider its long time behavior.
arXiv Detail & Related papers (2022-10-03T06:54:33Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Behavior of linear L2-boosting algorithms in the vanishing learning rate
asymptotic [0.0]
We investigate the behaviour of gradient boosting algorithms when the learning rate converges to zero and the number of iterations is rescaled accordingly.
We prove a limit in the vanishing learning rate and characterize the limit as the unique solution of a linear differential equation in an infinite dimensional function space.
arXiv Detail & Related papers (2020-12-29T08:37:54Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.