Momentum with Variance Reduction for Nonconvex Composition Optimization
- URL: http://arxiv.org/abs/2005.07755v1
- Date: Fri, 15 May 2020 19:29:33 GMT
- Title: Momentum with Variance Reduction for Nonconvex Composition Optimization
- Authors: Ziyi Chen, Yi Zhou
- Abstract summary: Composition optimization is widely-applied in non machine learning.
Various advanced algorithms that adopt momentum and variance reduction techniques have been developed.
Our algorithm converges significantly faster than existing algorithms.
- Score: 9.657813709239335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Composition optimization is widely-applied in nonconvex machine learning.
Various advanced stochastic algorithms that adopt momentum and variance
reduction techniques have been developed for composition optimization. However,
these algorithms do not fully exploit both techniques to accelerate the
convergence and are lack of convergence guarantee in nonconvex optimization.
This paper complements the existing literature by developing various momentum
schemes with SPIDER-based variance reduction for non-convex composition
optimization. In particular, our momentum design requires less number of
proximal mapping evaluations per-iteration than that required by the existing
Katyusha momentum. Furthermore, our algorithm achieves near-optimal sample
complexity results in both non-convex finite-sum and online composition
optimization and achieves a linear convergence rate under the gradient dominant
condition. Numerical experiments demonstrate that our algorithm converges
significantly faster than existing algorithms in nonconvex composition
optimization.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Adaptive Stochastic Optimisation of Nonconvex Composite Objectives [2.1700203922407493]
We propose and analyse a family of generalised composite mirror descent algorithms.
With adaptive step sizes, the proposed algorithms converge without requiring prior knowledge of the problem.
We exploit the low-dimensional structure of the decision sets for high-dimensional problems.
arXiv Detail & Related papers (2022-11-21T18:31:43Z) - A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and
Nonsmooth Bi-level Optimization [26.68351521813062]
Existing bi-level algorithms cannot handle nonsmooth or hyper-smooth regularizers.
In this paper, we show that an implicit differentiation (AID) scheme can be used to speed up comprehensive machine learning applications.
arXiv Detail & Related papers (2022-03-30T18:53:04Z) - Consistent Approximations in Composite Optimization [0.0]
We develop a framework for consistent approximations of optimization problems.
The framework is developed for a broad class of optimizations.
A programming analysis method illustrates extended nonlinear programming solutions.
arXiv Detail & Related papers (2022-01-13T23:57:08Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Distributed Proximal Splitting Algorithms with Rates and Acceleration [7.691755449724637]
We derive sublinear and linear convergence results with new rates on the function value suboptimality or distance to the solution.
We propose distributed variants of these algorithms, which can be accelerated as well.
arXiv Detail & Related papers (2020-10-02T12:35:09Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.