Push--Pull with Device Sampling
- URL: http://arxiv.org/abs/2206.04113v1
- Date: Wed, 8 Jun 2022 18:18:18 GMT
- Title: Push--Pull with Device Sampling
- Authors: Yu-Guan Hsieh, Yassine Laguel, Franck Iutzeler, J\'er\^ome Malick
- Abstract summary: We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph.
We propose an algorithm that combines gradient tracking and variance reduction over the entire network.
Our theoretical analysis shows that the algorithm converges linearly, when the local objective functions are strongly convex.
- Score: 8.344476599818826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider decentralized optimization problems in which a number of agents
collaborate to minimize the average of their local functions by exchanging over
an underlying communication graph. Specifically, we place ourselves in an
asynchronous model where only a random portion of nodes perform computation at
each iteration, while the information exchange can be conducted between all the
nodes and in an asymmetric fashion. For this setting, we propose an algorithm
that combines gradient tracking and variance reduction over the entire network.
This enables each node to track the average of the gradients of the objective
functions. Our theoretical analysis shows that the algorithm converges
linearly, when the local objective functions are strongly convex, under mild
connectivity conditions on the expected mixing matrices. In particular, our
result does not require the mixing matrices to be doubly stochastic. In the
experiments, we investigate a broadcast mechanism that transmits information
from computing nodes to their neighbors, and confirm the linear convergence of
our method on both synthetic and real-world datasets.
Related papers
- Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization [5.685037987395183]
We study the distributed optimization problem over a graphon with a continuum of nodes.
We propose gradient descent and gradient tracking algorithms over the graphon.
We show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $mathcalLinfty$-consensus for a connected graphon.
arXiv Detail & Related papers (2024-07-03T02:47:39Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Decentralized Multi-Task Stochastic Optimization With Compressed
Communications [22.31884634659446]
The paper develops algorithms and obtains performance bounds for two different models of local information availability at the nodes.
We show that deviation from the global minimum value and violations of the constraints are upper-bounded by $mathcalO(T-frac12)$ and $mathcalO(T-frac14)$.
arXiv Detail & Related papers (2021-12-23T05:54:42Z) - Distributed stochastic proximal algorithm with random reshuffling for
non-smooth finite-sum optimization [28.862321453597918]
Non-smooth finite-sum minimization is a fundamental problem in machine learning.
This paper develops a distributed proximal-gradient algorithm with random reshuffling to solve the problem.
arXiv Detail & Related papers (2021-11-06T07:29:55Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Sparse Quadratic Optimisation over the Stiefel Manifold with Application
to Permutation Synchronisation [71.27989298860481]
We address the non- optimisation problem of finding a matrix on the Stiefel manifold that maximises a quadratic objective function.
We propose a simple yet effective sparsity-promoting algorithm for finding the dominant eigenspace matrix.
arXiv Detail & Related papers (2021-09-30T19:17:35Z) - On Accelerating Distributed Convex Optimizations [0.0]
This paper studies a distributed multi-agent convex optimization problem.
We show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods.
We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems.
arXiv Detail & Related papers (2021-08-19T13:19:54Z) - Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order
Coordinate Method [3.860616339202303]
This paper investigates the distributed non optimization problem of minimizing a global cost function formed by the summation of $ZOn$ local cost functions.
Agents approximate their own ZO coordinate method to solve the problem.
arXiv Detail & Related papers (2021-03-24T03:07:46Z) - Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant.
We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.