Stochastic Gradient Descent in Hilbert Scales: Smoothness,
Preconditioning and Earlier Stopping
- URL: http://arxiv.org/abs/2006.10840v1
- Date: Thu, 18 Jun 2020 20:22:04 GMT
- Title: Stochastic Gradient Descent in Hilbert Scales: Smoothness,
Preconditioning and Earlier Stopping
- Authors: Nicole M\"ucke and Enrico Reiss
- Abstract summary: We consider least squares learning in reproducing kernel Hilbert spaces (RKHSs) and extend the classical SGD analysis to a learning setting in Hilbert scales.
We show that even for well-specified models, violation of a traditional benchmark smoothness assumption has a tremendous effect on the learning rate.
In addition, we show that for miss-specified models, preconditioning in an appropriate Hilbert scale helps to reduce the number of iterations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Gradient Descent (SGD) has become the method of choice for solving
a broad range of machine learning problems. However, some of its learning
properties are still not fully understood. We consider least squares learning
in reproducing kernel Hilbert spaces (RKHSs) and extend the classical SGD
analysis to a learning setting in Hilbert scales, including Sobolev spaces and
Diffusion spaces on compact Riemannian manifolds. We show that even for
well-specified models, violation of a traditional benchmark smoothness
assumption has a tremendous effect on the learning rate. In addition, we show
that for miss-specified models, preconditioning in an appropriate Hilbert scale
helps to reduce the number of iterations, i.e. allowing for "earlier stopping".
Related papers
- Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the
Bounded Gradient Assumption [11.367487348673793]
We prove that various gradient descent methods, including the gradient descent (SGD), heavy-ball (SHB) and Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold.
This is the first time such results are obtained for SHB and SNAG methods.
arXiv Detail & Related papers (2023-02-15T18:53:41Z) - On the Convergence of Stochastic Gradient Descent for Linear Inverse
Problems in Banach Spaces [0.0]
gradient descent (SGD) has been established as one of the most successful optimisation methods in machine learning.
We present a novel convergence analysis of SGD for linear inverse problems in general Banach spaces.
arXiv Detail & Related papers (2023-02-10T12:00:49Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Optimal Learning Rates for Regularized Least-Squares with a Fourier
Capacity Condition [14.910167993978487]
We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales.
We demonstrate that the spectrum of the Mercer operator can be inferred in the presence of tight'' embeddings of suitable Hilbert scales.
arXiv Detail & Related papers (2022-04-16T18:32:33Z) - On the Generalization of Stochastic Gradient Descent with Momentum [58.900860437254885]
We first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees.
For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, and show that it admits an upper-bound on the generalization error.
For the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes.
arXiv Detail & Related papers (2021-02-26T18:58:29Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Early stopping and polynomial smoothing in regression with reproducing
kernels [2.132096006921048]
We study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS)
We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle.
The proposed rule is proved to be minimax-optimal over different types of kernel spaces.
arXiv Detail & Related papers (2020-07-14T05:27:18Z) - Stochastic Hamiltonian Gradient Methods for Smooth Games [51.47367707388402]
We focus on the class of Hamiltonian methods and provide the first convergence guarantees for certain classes of smooth games.
Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a gradient.
Our results provide the first global non-asymotic last-rate convergence guarantees for the class of general games.
arXiv Detail & Related papers (2020-07-08T15:42:13Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z) - Inverse learning in Hilbert scales [0.0]
We study the linear ill-posed inverse problem with noisy data in the statistical learning setting.
Approximate reconstructions from random noisy data are sought with general regularization schemes in Hilbert scale.
arXiv Detail & Related papers (2020-02-24T12:49:54Z) - On the Generalization of Stochastic Gradient Descent with Momentum [84.54924994010703]
momentum-based accelerated variants of gradient descent (SGD) are widely used when training machine learning models.
We first show that there exists a convex loss function for which the stability gap for multiple epochs of SGD with standard heavy-ball momentum (SGDM) becomes unbounded.
For smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, i.e., SGD with early momentum (SGDEM) under a broad range of step-sizes.
arXiv Detail & Related papers (2018-09-12T17:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.