Multi-consensus Decentralized Accelerated Gradient Descent
- URL: http://arxiv.org/abs/2005.00797v2
- Date: Tue, 10 Oct 2023 15:20:14 GMT
- Title: Multi-consensus Decentralized Accelerated Gradient Descent
- Authors: Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang
- Abstract summary: Decentralized convex optimization problem has wide range of applications in large-scale machine learning, sensor networks, and control theory.
We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity.
- Score: 31.76773130271547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the decentralized convex optimization problem, which has
a wide range of applications in large-scale machine learning, sensor networks,
and control theory. We propose novel algorithms that achieve optimal
computation complexity and near optimal communication complexity. Our
theoretical results give affirmative answers to the open problem on whether
there exists an algorithm that can achieve a communication complexity (nearly)
matching the lower bound depending on the global condition number instead of
the local one. Furthermore, the linear convergence of our algorithms only
depends on the strong convexity of global objective and it does \emph{not}
require the local functions to be convex. The design of our methods relies on a
novel integration of well-known techniques including Nesterov's acceleration,
multi-consensus and gradient-tracking. Empirical studies show the
outperformance of our methods for machine learning applications.
Related papers
- Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data [23.661713049508375]
We propose an algorithm that learns over a submanifold in the setting of a client.
We show that our proposed algorithm converges sub-ly to a neighborhood of a first-order optimal solution by using a novel analysis.
arXiv Detail & Related papers (2024-06-12T17:53:28Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Approaching Globally Optimal Energy Efficiency in Interference Networks
via Machine Learning [22.926877147296594]
This work presents a machine learning approach to optimize the energy efficiency (EE) in a multi-cell wireless network.
Results show that the method achieves an EE close to the optimum by the branch-and- computation testing.
arXiv Detail & Related papers (2022-11-25T08:36:34Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - A Penalty-Based Method for Communication-Efficient Decentralized Bilevel Programming [14.35928967799696]
This paper introduces a penalty function-based decentralized algorithm for solving bilevel programming problems over a decentralized network.
A key feature of the proposed algorithm is the estimation of the hyper-gradient of the penalty function.
Our theoretical framework ensures non-asymptotic convergence to the optimal solution of the original problem under various convexity conditions.
arXiv Detail & Related papers (2022-11-08T08:39:30Z) - On a class of geodesically convex optimization problems solved via
Euclidean MM methods [50.428784381385164]
We show how a difference of Euclidean convexization functions can be written as a difference of different types of problems in statistics and machine learning.
Ultimately, we helps the broader broader the broader the broader the broader the work.
arXiv Detail & Related papers (2022-06-22T23:57:40Z) - Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications.
In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - Communication-efficient Variance-reduced Stochastic Gradient Descent [0.0]
We consider the problem of communication efficient distributed optimization.
In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient.
Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
arXiv Detail & Related papers (2020-03-10T13:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.