Behavior of linear L2-boosting algorithms in the vanishing learning rate
asymptotic
- URL: http://arxiv.org/abs/2012.14657v1
- Date: Tue, 29 Dec 2020 08:37:54 GMT
- Title: Behavior of linear L2-boosting algorithms in the vanishing learning rate
asymptotic
- Authors: Cl\'ement Dombry (UBFC, LMB), Youssef Esstafa (ENSAI)
- Abstract summary: We investigate the behaviour of gradient boosting algorithms when the learning rate converges to zero and the number of iterations is rescaled accordingly.
We prove a limit in the vanishing learning rate and characterize the limit as the unique solution of a linear differential equation in an infinite dimensional function space.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the asymptotic behaviour of gradient boosting algorithms when
the learning rate converges to zero and the number of iterations is rescaled
accordingly. We mostly consider L2-boosting for regression with linear base
learner as studied in B{\"u}hlmann and Yu (2003) and analyze also a stochastic
version of the model where subsampling is used at each step (Friedman 2002). We
prove a deterministic limit in the vanishing learning rate asymptotic and
characterize the limit as the unique solution of a linear differential equation
in an infinite dimensional function space. Besides, the training and test error
of the limiting procedure are thoroughly analyzed. We finally illustrate and
discuss our result on a simple numerical experiment where the linear
L2-boosting operator is interpreted as a smoothed projection and time is
related to its number of degrees of freedom.
Related papers
- Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output.
We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output.
We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - A large sample theory for infinitesimal gradient boosting [0.0]
We consider the behavior of the algorithm of the model in the large sample limit and prove its convergence to a deterministic process.
We explore some properties of this population limit: we prove that the dynamics makes the test error decrease and we consider its long time behavior.
arXiv Detail & Related papers (2022-10-03T06:54:33Z) - Whiplash Gradient Descent Dynamics [2.0508733018954843]
We introduce the symplectic convergence analysis for the Whiplash system for convex functions.
We study the algorithm's performance for various costs and provide a practical methodology for analyzing convergence rates.
arXiv Detail & Related papers (2022-03-04T05:47:26Z) - Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field.
Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous.
We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Learning Linearized Assignment Flows for Image Labeling [70.540936204654]
We introduce a novel algorithm for estimating optimal parameters of linearized assignment flows for image labeling.
We show how to efficiently evaluate this formula using a Krylov subspace and a low-rank approximation.
arXiv Detail & Related papers (2021-08-02T13:38:09Z) - Infinitesimal gradient boosting [0.0]
We define infinitesimal gradient boosting as a limit of the popular tree-based gradient boosting algorithm from machine learning.
We introduce a new class of randomized regression trees bridging totally randomized trees and Extra Trees.
arXiv Detail & Related papers (2021-04-26T15:09:05Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity [40.73281056650241]
We introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true gradient temporal difference learning algorithms.
We show how gradient TD reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function.
arXiv Detail & Related papers (2020-06-06T21:04:21Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.