Accelerated Gradient Algorithms with Adaptive Subspace Search for
Instance-Faster Optimization
- URL: http://arxiv.org/abs/2312.03218v1
- Date: Wed, 6 Dec 2023 01:16:10 GMT
- Title: Accelerated Gradient Algorithms with Adaptive Subspace Search for
Instance-Faster Optimization
- Authors: Yuanshi Liu, Hanzhen Zhao, Yang Xu, Pengyun Yue, Cong Fang
- Abstract summary: Gradient-based minimax optimal algorithms have promoted the development of continuous optimization and machine learning.
In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning.
- Score: 6.896308995955336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient-based minimax optimal algorithms have greatly promoted the
development of continuous optimization and machine learning. One seminal work
due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/\mu})$
gradient complexity for minimizing an $L$-smooth $\mu$-strongly convex
objective. However, an ideal algorithm would adapt to the explicit complexity
of a particular objective function and incur faster rates for simpler problems,
triggering our reconsideration of two defeats of existing optimization modeling
and analysis. (i) The worst-case optimality is neither the instance optimality
nor such one in reality. (ii) Traditional $L$-smoothness condition may not be
the primary abstraction/characterization for modern practical problems.
In this paper, we open up a new way to design and analyze gradient-based
algorithms with direct applications in machine learning, including linear
regression and beyond. We introduce two factors $(\alpha, \tau_{\alpha})$ to
refine the description of the degenerated condition of the optimization
problems based on the observation that the singular values of Hessian often
drop sharply. We design adaptive algorithms that solve simpler problems without
pre-known knowledge with reduced gradient or analogous oracle accesses. The
algorithms also improve the state-of-art complexities for several problems in
machine learning, thereby solving the open problem of how to design faster
algorithms in light of the known complexity lower bounds. Specially, with the
$\mathcal{O}(1)$-nuclear norm bounded, we achieve an optimal
$\tilde{\mathcal{O}}(\mu^{-1/3})$ (v.s. $\tilde{\mathcal{O}}(\mu^{-1/2})$)
gradient complexity for linear regression. We hope this work could invoke the
rethinking for understanding the difficulty of modern problems in optimization.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications.
In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - Distributed Stochastic Consensus Optimization with Momentum for
Nonconvex Nonsmooth Problems [45.88640334610308]
This paper presents a new distributed optimization algorithm for non-smooth problems.
We show that the proposed algorithm can achieve an overcal communication.
Experiments are presented to illustrate the effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-10T13:12:21Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.