A large sample theory for infinitesimal gradient boosting
- URL: http://arxiv.org/abs/2210.00736v2
- Date: Wed, 12 Jul 2023 07:27:45 GMT
- Title: A large sample theory for infinitesimal gradient boosting
- Authors: Clement Dombry and Jean-Jil Duchamps
- Abstract summary: We consider the behavior of the algorithm of the model in the large sample limit and prove its convergence to a deterministic process.
We explore some properties of this population limit: we prove that the dynamics makes the test error decrease and we consider its long time behavior.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infinitesimal gradient boosting (Dombry and Duchamps, 2021) is defined as the
vanishing-learning-rate limit of the popular tree-based gradient boosting
algorithm from machine learning. It is characterized as the solution of a
nonlinear ordinary differential equation in a infinite-dimensional function
space where the infinitesimal boosting operator driving the dynamics depends on
the training sample. We consider the asymptotic behavior of the model in the
large sample limit and prove its convergence to a deterministic process. This
population limit is again characterized by a differential equation that depends
on the population distribution. We explore some properties of this population
limit: we prove that the dynamics makes the test error decrease and we consider
its long time behavior.
Related papers
- Limit Theorems for Stochastic Gradient Descent with Infinite Variance [47.87144151929621]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.
We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z) - Sampling in Constrained Domains with Orthogonal-Space Variational
Gradient Descent [13.724361914659438]
We propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold.
We prove that O-Gradient converges to the target constrained distribution with rate $widetildeO (1/textthe number of iterations)$ under mild conditions.
arXiv Detail & Related papers (2022-10-12T17:51:13Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Infinitesimal gradient boosting [0.0]
We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning.
We introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees.
arXiv Detail & Related papers (2021-04-26T15:09:05Z) - Behavior of linear L2-boosting algorithms in the vanishing learning rate
asymptotic [0.0]
We investigate the behaviour of gradient boosting algorithms when the learning rate converges to zero and the number of iterations is rescaled accordingly.
We prove a limit in the vanishing learning rate and characterize the limit as the unique solution of a linear differential equation in an infinite dimensional function space.
arXiv Detail & Related papers (2020-12-29T08:37:54Z) - A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.