Big-Step-Little-Step: Efficient Gradient Methods for Objectives with
Multiple Scales
- URL: http://arxiv.org/abs/2111.03137v1
- Date: Thu, 4 Nov 2021 20:09:12 GMT
- Title: Big-Step-Little-Step: Efficient Gradient Methods for Objectives with
Multiple Scales
- Authors: Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory
Valiant, Honglin Yuan
- Abstract summary: We consider the problem of minimizing a function $f : mathbbRd rightarrow mathbbR$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions.
We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems.
- Score: 45.994415581007054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide new gradient-based methods for efficiently solving a broad class
of ill-conditioned optimization problems. We consider the problem of minimizing
a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly
decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex
functions and provide a method which solves this problem with a number of
gradient evaluations that scales (up to logarithmic factors) as the product of
the square-root of the condition numbers of the components. This complexity
bound (which we prove is nearly optimal) can improve almost exponentially on
that of accelerated gradient methods, which grow as the square root of the
condition number of $f$. Additionally, we provide efficient methods for solving
stochastic, quadratic variants of this multiscale optimization problem. Rather
than learn the decomposition of $f$ (which would be prohibitively expensive),
our methods apply a clean recursive "Big-Step-Little-Step" interleaving of
standard methods. The resulting algorithms use $\tilde{\mathcal{O}}(d m)$
space, are numerically stable, and open the door to a more fine-grained
understanding of the complexity of convex optimization beyond condition number.
Related papers
- Improved Complexity for Smooth Nonconvex Optimization: A Two-Level Online Learning Approach with Quasi-Newton Methods [18.47705532817026]
We study the problem of finding an $epsilon$firstorder stationary point (FOSP) of a smooth function.
We introduce a novel optimistic quasi- gradient approximation method to solve this online learning problem.
arXiv Detail & Related papers (2024-12-03T05:20:05Z) - An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization [16.709026203727007]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem.
We measure the performance of our method in terms of suboptimality and infeasibility errors.
arXiv Detail & Related papers (2024-02-12T22:34:53Z) - Accelerated Gradient Algorithms with Adaptive Subspace Search for
Instance-Faster Optimization [6.896308995955336]
Gradient-based minimax optimal algorithms have promoted the development of continuous optimization and machine learning.
In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning.
arXiv Detail & Related papers (2023-12-06T01:16:10Z) - Orthogonal Directions Constrained Gradient Method: from non-linear
equality constraints to Stiefel manifold [16.099883128428054]
We propose a novel algorithm, the Orthogonal Directions Constrained Method (ODCGM)
ODCGM only requires computing a projection onto a vector space.
We show that ODCGM exhibits the near-optimal oracle complexities.
arXiv Detail & Related papers (2023-03-16T12:25:53Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.