Stochastic Hessian Fittings with Lie Groups
- URL: http://arxiv.org/abs/2402.11858v4
- Date: Sun, 01 Dec 2024 05:15:11 GMT
- Title: Stochastic Hessian Fittings with Lie Groups
- Authors: Xi-Lin Li,
- Abstract summary: Hessian fitting criterion from the preconditioned gradient descent (PSGD) method is studied.
Hessian fitting itself as an optimization problem is strongly convex under mild conditions in certain general Lie groups.
This discovery turns Hessian fitting into a well-behaved Lie group optimization problem.
- Score: 6.626539885456148
- License:
- Abstract: This report studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second-order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton algorithm, natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and reliability differences among a wide range of preconditioner fitting methods, from closed-form to iterative solutions, using Hessian-vector products or stochastic gradients only, with Hessian fittings in the Euclidean space, the manifold of symmetric positive definite (SPL) matrices, to a variety of Lie groups. The most intriguing discovery is that the Hessian fitting itself as an optimization problem is strongly convex under mild conditions in certain general Lie groups. This discovery turns Hessian fitting into a well-behaved Lie group optimization problem and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.
Related papers
- ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm, dubbed ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
We present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate algorithms for the considered class of cFCCO problems.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - Extrinsic Bayesian Optimizations on Manifolds [1.3477333339913569]
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on Euclid manifold.
Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensionalean space.
This leads to efficient and scalable algorithms for optimization over complex manifold.
arXiv Detail & Related papers (2022-12-21T06:10:12Z) - Black Box Lie Group Preconditioners for SGD [13.30021794793606]
A matrix free and a low rank approximation preconditioner are proposed to accelerate the convergence of gradient descent.
The learning rate for parameter updating and step size for preconditioner fitting are naturally normalized, and their default values work well in most situations.
arXiv Detail & Related papers (2022-11-08T18:07:08Z) - Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives [1.7640556247739623]
We analyze algorithms for zeroth-order entropy composite objectives, focusing on dependence on dimensionality.
This is achieved by exploiting low dimensional structure of the decision set using the mirror descent method with an estimation alike function.
To improve the gradient, we replace the classic sampling method based on Rademacher and show that the mini-batch method copes with non-Eucli geometry.
arXiv Detail & Related papers (2022-08-09T07:36:25Z) - A Unified Convergence Theorem for Stochastic Optimization Methods [4.94128206910124]
We provide a fundamental unified convergence theorem used for deriving convergence results for a series of unified optimization methods.
As a direct application, we recover almost sure convergence results under general settings.
arXiv Detail & Related papers (2022-06-08T14:01:42Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z) - Optimization of Graph Total Variation via Active-Set-based Combinatorial
Reconditioning [48.42916680063503]
We propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class.
We show that nested-forest decomposition of the inactive edges yields a guaranteed local linear convergence rate.
Our results suggest that local convergence analysis can serve as a guideline for selecting variable metrics in proximal algorithms.
arXiv Detail & Related papers (2020-02-27T16:33:09Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.