Related papers: Optimal Complexity in Decentralized Training

Optimal Complexity in Decentralized Training

URL: http://arxiv.org/abs/2006.08085v4
Date: Fri, 28 Jan 2022 02:36:56 GMT
Title: Optimal Complexity in Decentralized Training
Authors: Yucheng Lu, Christopher De Sa
Abstract summary: We present a gossip-style decentralized algorithm that achieves the lower bound with only a gap. We show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.
Score: 45.468216452357375
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.

Related papers

Unified Analysis of Decentralized Gradient Descent: a Contraction Mapping Framework [33.417831716314495]
Decentralized gradient descent (DGD) and diffusion are workhorses in decentralized machine learning. We propose a principled framework for the analysis of DGD and diffusion for strongly convex, smooth objectives, and arbitrary undirected topologies. The use of these tools yields tight convergence bounds, both in the noise-free and noisy regimes.
arXiv Detail & Related papers (2025-03-18T15:36:36Z)
From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce. This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning. This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
Cross-Gradient Aggregation for Decentralized Learning from Non-IID data [34.23789472226752]
Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server. We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm. We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
arXiv Detail & Related papers (2021-03-02T21:58:12Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors. We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
Gradient tracking and variance reduction for decentralized optimization and machine learning [19.54092620537586]
Decentralized methods to solve finite-sum problems are important in many signal processing and machine learning tasks. We provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve robust performance.
arXiv Detail & Related papers (2020-02-13T07:17:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.