Obtaining Adjustable Regularization for Free via Iterate Averaging
- URL: http://arxiv.org/abs/2008.06736v1
- Date: Sat, 15 Aug 2020 15:28:05 GMT
- Title: Obtaining Adjustable Regularization for Free via Iterate Averaging
- Authors: Jingfeng Wu, Vladimir Braverman, Lin F. Yang
- Abstract summary: Regularization for optimization is a crucial technique to avoid overfitting in machine learning.
We establish an averaging scheme that converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart.
Our approaches can be used for accelerated and preconditioned optimization methods as well.
- Score: 43.75491612671571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regularization for optimization is a crucial technique to avoid overfitting
in machine learning. In order to obtain the best performance, we usually train
a model by tuning the regularization parameters. It becomes costly, however,
when a single round of training takes significant amount of time. Very
recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD)
on linear regression problems, then by averaging the SGD iterates properly, we
obtain a regularized solution. It left open whether the same phenomenon can be
achieved for other optimization problems and algorithms. In this paper, we
establish an averaging scheme that provably converts the iterates of SGD on an
arbitrary strongly convex and smooth objective function to its regularized
counterpart with an adjustable regularization parameter. Our approaches can be
used for accelerated and preconditioned optimization methods as well. We
further show that the same methods work empirically on more general
optimization objectives including neural networks. In sum, we obtain adjustable
regularization for free for a large class of optimization problems and resolve
an open question raised by Neu and Rosasco.
Related papers
- Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Continuation Path with Linear Convergence Rate [18.405645120971496]
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems are solved sequentially.
We present a primal dual analysis of the path-following algorithm as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem.
Considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter.
arXiv Detail & Related papers (2021-12-09T18:42:13Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - An adaptive stochastic gradient-free approach for high-dimensional
blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems.
We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z) - Optimizing generalization on the train set: a novel gradient-based
framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Stochastic batch size for adaptive regularization in deep network
optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework.
We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.