A theoretical and empirical study of new adaptive algorithms with
additional momentum steps and shifted updates for stochastic non-convex
optimization
- URL: http://arxiv.org/abs/2110.08531v2
- Date: Wed, 31 Jan 2024 20:52:39 GMT
- Title: A theoretical and empirical study of new adaptive algorithms with
additional momentum steps and shifted updates for stochastic non-convex
optimization
- Authors: Cristian Daniel Alecsa
- Abstract summary: It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field.
In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is known that adaptive optimization algorithms represent the key pillar
behind the rise of the Machine Learning field. In the Optimization literature
numerous studies have been devoted to accelerated gradient methods but only
recently adaptive iterative techniques were analyzed from a theoretical point
of view. In the present paper we introduce new adaptive algorithms endowed with
momentum terms for stochastic non-convex optimization problems. Our purpose is
to show a deep connection between accelerated methods endowed with different
inertial steps and AMSGrad-type momentum methods. Our methodology is based on
the framework of stochastic and possibly non-convex objective mappings, along
with some assumptions that are often used in the investigation of adaptive
algorithms. In addition to discussing the finite-time horizon analysis in
relation to a certain final iteration and the almost sure convergence to
stationary points, we shall also look at the worst-case iteration complexity.
This will be followed by an estimate for the expectation of the squared
Euclidean norm of the gradient. Various computational simulations for the
training of neural networks are being used to support the theoretical analysis.
For future research we emphasize that there are multiple possible extensions to
our work, from which we mention the investigation regarding non-smooth
objective functions and the theoretical analysis of a more general formulation
that encompass our adaptive optimizers in a stochastic framework.
Related papers
- Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis [17.34603953600226]
Adaptives have emerged as powerful tools in deep learning, dynamically adjusting the learning rate based on gradient.
These methods have significantly succeeded in various deep learning tasks, but AdaGrad is the cornerstone of this work.
arXiv Detail & Related papers (2024-09-08T08:29:51Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Adaptive Sampling of Pareto Frontiers with Binary Constraints Using
Regression and Classification [0.0]
We present a novel adaptive optimization algorithm for black-box multi-objective optimization problems with binary constraints.
Our method is based on probabilistic regression and classification models, which act as a surrogate for the optimization goals.
We also present a novel ellipsoid truncation method to speed up the expected hypervolume calculation.
arXiv Detail & Related papers (2020-08-27T09:15:02Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.