On the Divergence of Decentralized Non-Convex Optimization
- URL: http://arxiv.org/abs/2006.11662v1
- Date: Sat, 20 Jun 2020 21:42:06 GMT
- Title: On the Divergence of Decentralized Non-Convex Optimization
- Authors: Mingyi Hong, Siliang Zeng, Junyu Zhang, Haoran Sun
- Abstract summary: We show that when certain local conditions (LLC) on the local function $nabla f_iilons are not satisfied, most of the existing decentralized algorithms diverge.
In particular, we show that the proposed algorithm converges subly to certain $epsilon-stationary solution.
- Score: 44.743563268496715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a generic class of decentralized algorithms in which $N$ agents
jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$,
while only communicating with their neighbors. This class of problems has
become popular in modeling many signal processing and machine learning
applications, and many efficient algorithms have been proposed. However, by
constructing some counter-examples, we show that when certain local Lipschitz
conditions (LLC) on the local function gradient $\nabla f_i$'s are not
satisfied, most of the existing decentralized algorithms diverge, even if the
global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has
Lipschitz gradient. This observation raises an important open question: How to
design decentralized algorithms when the LLC, or even the GLC, is not
satisfied?
To address the above question, we design a first-order algorithm called
Multi-stage gradient tracking algorithm (MAGENTA), which is capable of
computing stationary solutions with neither the LLC nor the GLC. In particular,
we show that the proposed algorithm converges sublinearly to certain
$\epsilon$-stationary solution, where the precise rate depends on various
algorithmic and problem parameters. In particular, if the local function
$f_i$'s are $Q$th order polynomials, then the rate becomes
$\mathcal{O}(1/\epsilon^{Q-1})$. Such a rate is tight for the special case of
$Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first
attempt that studies decentralized non-convex optimization problems with
neither the LLC nor the GLC.
Related papers
- Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic
Composite Optimization [10.762749887051546]
We propose two-time scale algorithms: ProxDAS-A and Proxcal$DASA-GT.
Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-it operations, or stronger assumptions.
arXiv Detail & Related papers (2023-02-20T05:16:18Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Randomized Block-Coordinate Optimistic Gradient Algorithms for
Root-Finding Problems [8.0153031008486]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings.
We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
arXiv Detail & Related papers (2023-01-08T21:46:27Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Derivative-free Alternating Projection Algorithms for General
Nonconvex-Concave Minimax Problems [9.173866646584031]
In this paper, we propose an algorithm for nonsmooth zeroth-order minimax problems.
We show that it can be used to attack nonconcave minimax problems.
arXiv Detail & Related papers (2021-08-01T15:23:49Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.