Communication-efficient Variance-reduced Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2003.04686v1
- Date: Tue, 10 Mar 2020 13:22:16 GMT
- Title: Communication-efficient Variance-reduced Stochastic Gradient Descent
- Authors: Hossein S. Ghadikolaei and Sindri Magnusson
- Abstract summary: We consider the problem of communication efficient distributed optimization.
In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient.
Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of communication efficient distributed optimization
where multiple nodes exchange important algorithm information in every
iteration to solve large problems. In particular, we focus on the stochastic
variance-reduced gradient and propose a novel approach to make it
communication-efficient. That is, we compress the communicated information to a
few bits while preserving the linear convergence rate of the original
uncompressed algorithm. Comprehensive theoretical and numerical analyses on
real datasets reveal that our algorithm can significantly reduce the
communication complexity, by as much as 95\%, with almost no noticeable
penalty. Moreover, it is much more robust to quantization (in terms of
maintaining the true minimizer and the convergence rate) than the
state-of-the-art algorithms for solving distributed optimization problems. Our
results have important implications for using machine learning over
internet-of-things and mobile networks.
Related papers
- Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data [23.661713049508375]
We propose an algorithm that learns over a submanifold in the setting of a client.
We show that our proposed algorithm converges sub-ly to a neighborhood of a first-order optimal solution by using a novel analysis.
arXiv Detail & Related papers (2024-06-12T17:53:28Z) - Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification [50.406127962933915]
ACOWA allows an extra round of communication to achieve noticeably better approximation quality with minor runtime increases.
Results show that ACOWA obtains solutions that are more faithful to the empirical risk minimizer and attain substantially higher accuracy than other distributed algorithms.
arXiv Detail & Related papers (2024-06-03T19:43:06Z) - Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks [57.24087627267086]
We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network.
Lower bounds on the number of decentralized communications and (sub)gradient computations required to solve the problem have been established.
We develop the first optimal algorithm that matches these lower bounds and offers substantially improved theoretical performance compared to the existing state of the art.
arXiv Detail & Related papers (2024-05-28T10:28:45Z) - Quantization Avoids Saddle Points in Distributed Optimization [1.579622195923387]
Distributed non optimization underpins key functionalities of numerous distributed systems.
The aim of this paper is to prove that it can effectively escape saddle points convergence to a second-order stationary point convergence.
With an easily adjustable quantization, the approach allows a user control to aggressively reduce communication overhead.
arXiv Detail & Related papers (2024-03-15T15:58:20Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - DESTRESS: Computation-Optimal and Communication-Efficient Decentralized
Nonconvex Finite-Sum Optimization [43.31016937305845]
Internet-of-things, networked sensing, autonomous systems and federated learning call for decentralized algorithms for finite-sum optimizations.
We develop DEcentralized STochastic REcurSive methodDESTRESS for non finite-sum optimization.
Detailed theoretical and numerical comparisons show that DESTRESS improves upon prior decentralized algorithms.
arXiv Detail & Related papers (2021-10-04T03:17:41Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Multi-consensus Decentralized Accelerated Gradient Descent [31.76773130271547]
Decentralized convex optimization problem has wide range of applications in large-scale machine learning, sensor networks, and control theory.
We propose novel algorithms that achieve optimal computation complexity and near optimal communication complexity.
arXiv Detail & Related papers (2020-05-02T11:10:32Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.