Faster Stochastic Alternating Direction Method of Multipliers for
Nonconvex Optimization
- URL: http://arxiv.org/abs/2008.01296v3
- Date: Mon, 10 Aug 2020 03:20:17 GMT
- Title: Faster Stochastic Alternating Direction Method of Multipliers for
Nonconvex Optimization
- Authors: Feihu Huang, Songcan Chen, Heng Huang
- Abstract summary: In this paper, we propose a faster alternating direction of multipliers (ADMM) for non-integrated optimization by using a new path, called SPADMM.
We prove that the SPADMM achieves a-breaking first-order differential oracle estimator (IFO) for finding a record of an IFO.
Our theoretical analysis shows that the online SPIDER-ADMM has the IFOFO(epsilon) by a factor of $mathcalO(n1)$.
- Score: 110.52708815647613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a faster stochastic alternating direction method of
multipliers (ADMM) for nonconvex optimization by using a new stochastic
path-integrated differential estimator (SPIDER), called as SPIDER-ADMM.
Moreover, we prove that the SPIDER-ADMM achieves a record-breaking incremental
first-order oracle (IFO) complexity of $\mathcal{O}(n+n^{1/2}\epsilon^{-1})$
for finding an $\epsilon$-approximate stationary point, which improves the
deterministic ADMM by a factor $\mathcal{O}(n^{1/2})$, where $n$ denotes the
sample size. As one of major contribution of this paper, we provide a new
theoretical analysis framework for nonconvex stochastic ADMM methods with
providing the optimal IFO complexity. Based on this new analysis framework, we
study the unsolved optimal IFO complexity of the existing non-convex SVRG-ADMM
and SAGA-ADMM methods, and prove they have the optimal IFO complexity of
$\mathcal{O}(n+n^{2/3}\epsilon^{-1})$. Thus, the SPIDER-ADMM improves the
existing stochastic ADMM methods by a factor of $\mathcal{O}(n^{1/6})$.
Moreover, we extend SPIDER-ADMM to the online setting, and propose a faster
online SPIDER-ADMM. Our theoretical analysis shows that the online SPIDER-ADMM
has the IFO complexity of $\mathcal{O}(\epsilon^{-\frac{3}{2}})$, which
improves the existing best results by a factor of
$\mathcal{O}(\epsilon^{-\frac{1}{2}})$. Finally, the experimental results on
benchmark datasets validate that the proposed algorithms have faster
convergence rate than the existing ADMM algorithms for nonconvex optimization.
Related papers
- Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - A Convergent ADMM Framework for Efficient Neural Network Training [17.764095204676973]
Alternating Direction Method of Multipliers (ADMM) has achieved tremendous success in many classification and regression applications.
We propose a novel framework to solve a general neural network training problem via ADMM (dlADMM) to address these challenges simultaneously.
Experiments on seven benchmark datasets demonstrate the convergence, efficiency, and effectiveness of our proposed dlADMM algorithm.
arXiv Detail & Related papers (2021-12-22T01:55:24Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Geom-SPIDER-EM: Faster Variance Reduced Stochastic Expectation
Maximization for Nonconvex Finite-Sum Optimization [21.81837334970773]
We propose an extension of the Path-Integrated Differential Estima to the Expectation Maximization (EM) algorithm.
We show it supports the same state art bounds as SPIDER-EM-IDER; and results provide for a rate for our findings.
arXiv Detail & Related papers (2020-11-24T21:20:53Z) - Differentially Private ADMM Algorithms for Machine Learning [38.648113004535155]
We study efficient differentially private alternating direction methods of multipliers (ADMM) via gradient perturbation.
We propose the first differentially private ADMM (DP-ADMM) algorithm with performance guarantee of $(epsilon,delta)$-differential privacy.
arXiv Detail & Related papers (2020-10-31T01:37:24Z) - Convergence of Meta-Learning with Task-Specific Adaptation over Partial
Parameters [152.03852111442114]
Although model-agnostic metalearning (MAML) is a very successful algorithm meta-learning practice, it can have high computational complexity.
Our paper shows that such complexity can significantly affect the overall convergence performance of ANIL.
arXiv Detail & Related papers (2020-06-16T19:57:48Z) - Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query
Complexity [109.54166127479093]
Zeroth-order (a.k.a, derivative-free) methods are a class of effective optimization methods for solving machine learning problems.
In this paper, we propose a class faster faster zerothorder alternating gradient method multipliers (MMADMM) to solve the non finitesum problems.
We show that ZOMMAD methods can achieve a lower function $O(frac13nfrac1)$ for finding an $epsilon$-stationary point.
At the same time, we propose a class faster zerothorder online ADM methods (M
arXiv Detail & Related papers (2019-07-30T02:21:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.