Black Box Lie Group Preconditioners for SGD
- URL: http://arxiv.org/abs/2211.04422v1
- Date: Tue, 8 Nov 2022 18:07:08 GMT
- Title: Black Box Lie Group Preconditioners for SGD
- Authors: Xilin Li
- Abstract summary: A matrix free and a low rank approximation preconditioner are proposed to accelerate the convergence of gradient descent.
The learning rate for parameter updating and step size for preconditioner fitting are naturally normalized, and their default values work well in most situations.
- Score: 13.30021794793606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A matrix free and a low rank approximation preconditioner are proposed to
accelerate the convergence of stochastic gradient descent (SGD) by exploiting
curvature information sampled from Hessian-vector products or finite
differences of parameters and gradients similar to the BFGS algorithm. Both
preconditioners are fitted with an online updating manner minimizing a
criterion that is free of line search and robust to stochastic gradient noise,
and further constrained to be on certain connected Lie groups to preserve their
corresponding symmetry or invariance, e.g., orientation of coordinates by the
connected general linear group with positive determinants. The Lie group's
equivariance property facilitates preconditioner fitting, and its invariance
property saves any need of damping, which is common in second-order optimizers,
but difficult to tune. The learning rate for parameter updating and step size
for preconditioner fitting are naturally normalized, and their default values
work well in most situations.
Related papers
- Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions [18.47705532817026]
We show that AdaGrad outperforms SGD by a factor of $d$ under certain conditions.
Motivated by this, we introduce assumptions on the smoothness structure of the objective and the gradient variance.
arXiv Detail & Related papers (2024-06-07T02:55:57Z) - Stochastic Hessian Fittings with Lie Groups [6.626539885456148]
Hessian fitting as an optimization problem is strongly convex under mild conditions with a specific yet general enough Lie group.
This discovery turns Hessian fitting into a well behaved optimization problem, and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large scale optimizations.
arXiv Detail & Related papers (2024-02-19T06:00:35Z) - Curvature-Informed SGD via General Purpose Lie-Group Preconditioners [6.760212042305871]
We present a novel approach to accelerate gradient descent (SGD) by utilizing curvature information.
Our approach involves two preconditioners: a matrix-free preconditioner and a low-rank approximation preconditioner.
We demonstrate that Preconditioned SGD (PSGD) outperforms SoTA on Vision, NLP, and RL tasks.
arXiv Detail & Related papers (2024-02-07T03:18:00Z) - Adaptive Step Sizes for Preconditioned Stochastic Gradient Descent [0.3831327965422187]
This paper proposes a novel approach to adaptive step sizes in gradient descent (SGD)
We use quantities that we have identified as numerically traceable -- the Lipschitz constant for gradients and a concept of the local variance in search directions.
arXiv Detail & Related papers (2023-11-28T17:03:56Z) - The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded
Gradients and Affine Variance [46.15915820243487]
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
arXiv Detail & Related papers (2022-02-11T17:37:54Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.