Related papers: Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

URL: http://arxiv.org/abs/2002.05359v1
Date: Thu, 13 Feb 2020 05:42:27 GMT
Title: Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
Authors: Samuel Horv\'ath, Lihua Lei, Peter Richt\'arik, Michael I. Jordan
Abstract summary: Adaptivity is an important yet under-studied property in modern optimization theory. Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
Score: 71.03797261151605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the "geometrization" technique introduced by Lei & Jordan 2016 and the \texttt{SARAH} algorithm of Nguyen et al., 2017, we propose the Geometrized \texttt{SARAH} algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-\L{}ojasiewicz (PL) constant if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.

Related papers

A novel algorithm for optimizing bundle adjustment in image sequence alignment [6.322876598831792]
This paper introduces a novel algorithm for optimizing the Bundle Adjustment (BA) model in the context of image sequence alignment for cryo-electron tomography. Extensive experiments on both synthetic and real-world datasets were conducted to evaluate the algorithm's performance.
arXiv Detail & Related papers (2024-11-10T03:19:33Z)
Learning to optimize with convergence guarantees using nonlinear system theory [0.4143603294943439]
We propose an unconstrained parametrization of algorithms for smooth objective functions. Notably, our framework is directly compatible with automatic differentiation tools.
arXiv Detail & Related papers (2024-03-14T13:40:26Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Accelerating Cutting-Plane Algorithms via Reinforcement Learning Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms. Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability. We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z)
Optimistic Optimisation of Composite Objective with Exponentiated Update [2.1700203922407493]
The algorithms can be interpreted as the combination of the exponentiated gradient and $p$-norm algorithm. They achieve a sequence-dependent regret upper bound, matching the best-known bounds for sparse target decision variables.
arXiv Detail & Related papers (2022-08-08T11:29:55Z)
A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization [0.0]
It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field. In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
arXiv Detail & Related papers (2021-10-16T09:47:57Z)
Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods. Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.