Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
- URL: http://arxiv.org/abs/2002.05359v1
- Date: Thu, 13 Feb 2020 05:42:27 GMT
- Title: Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
- Authors: Samuel Horv\'ath, Lihua Lei, Peter Richt\'arik, Michael I. Jordan
- Abstract summary: Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
- Score: 71.03797261151605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptivity is an important yet under-studied property in modern optimization
theory. The gap between the state-of-the-art theory and the current practice is
striking in that algorithms with desirable theoretical guarantees typically
involve drastically different settings of hyperparameters, such as step-size
schemes and batch sizes, in different regimes. Despite the appealing
theoretical results, such divisive strategies provide little, if any, insight
to practitioners to select algorithms that work broadly without tweaking the
hyperparameters. In this work, blending the "geometrization" technique
introduced by Lei & Jordan 2016 and the \texttt{SARAH} algorithm of Nguyen et
al., 2017, we propose the Geometrized \texttt{SARAH} algorithm for non-convex
finite-sum and stochastic optimization. Our algorithm is proved to achieve
adaptivity to both the magnitude of the target accuracy and the
Polyak-\L{}ojasiewicz (PL) constant if present. In addition, it achieves the
best-available convergence rate for non-PL objectives simultaneously while
outperforming existing algorithms for PL objectives.
Related papers
- A novel algorithm for optimizing bundle adjustment in image sequence alignment [6.322876598831792]
This paper introduces a novel algorithm for optimizing the Bundle Adjustment (BA) model in the context of image sequence alignment for cryo-electron tomography.
Extensive experiments on both synthetic and real-world datasets were conducted to evaluate the algorithm's performance.
arXiv Detail & Related papers (2024-11-10T03:19:33Z) - Learning to optimize with convergence guarantees using nonlinear system theory [0.4143603294943439]
We propose an unconstrained parametrization of algorithms for smooth objective functions.
Notably, our framework is directly compatible with automatic differentiation tools.
arXiv Detail & Related papers (2024-03-14T13:40:26Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Optimistic Optimisation of Composite Objective with Exponentiated Update [2.1700203922407493]
The algorithms can be interpreted as the combination of the exponentiated gradient and $p$-norm algorithm.
They achieve a sequence-dependent regret upper bound, matching the best-known bounds for sparse target decision variables.
arXiv Detail & Related papers (2022-08-08T11:29:55Z) - A theoretical and empirical study of new adaptive algorithms with
additional momentum steps and shifted updates for stochastic non-convex
optimization [0.0]
It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field.
In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
arXiv Detail & Related papers (2021-10-16T09:47:57Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices.
We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT)
Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.