RelaySum for Decentralized Deep Learning on Heterogeneous Data
- URL: http://arxiv.org/abs/2110.04175v1
- Date: Fri, 8 Oct 2021 14:55:32 GMT
- Title: RelaySum for Decentralized Deep Learning on Heterogeneous Data
- Authors: Thijs Vogels and Lie He and Anastasia Koloskova and Tao Lin and Sai
Praneeth Karimireddy and Sebastian U. Stich and Martin Jaggi
- Abstract summary: In decentralized machine learning, workers compute model updates on their local data.
Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
- Score: 71.36228931225362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In decentralized machine learning, workers compute model updates on their
local data. Because the workers only communicate with few neighbors without
central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all
connectivity, helping to protect data privacy as well as to reduce the
communication cost of distributed training in data centers. A key challenge,
primarily in decentralized deep learning, remains the handling of differences
between the workers' local data distributions. To tackle this challenge, we
introduce the RelaySum mechanism for information propagation in decentralized
learning. RelaySum uses spanning trees to distribute information exactly
uniformly across all workers with finite delays depending on the distance
between nodes. In contrast, the typical gossip averaging mechanism only
distributes data uniformly asymptotically while using the same communication
volume per step as RelaySum. We prove that RelaySGD, based on this mechanism,
is independent of data heterogeneity and scales to many workers, enabling
highly accurate decentralized deep learning on heterogeneous data. Our code is
available at http://github.com/epfml/relaysgd.
Related papers
- Communication Efficient Distributed Learning over Wireless Channels [35.90632878033643]
Vertical distributed learning exploits the local features collected by multiple learning workers to form a better global model.
We propose a novel hierarchical distributed learning framework, where each worker separately learns a low-dimensional embedding of their local observed data.
We show that the proposed learning framework is able to achieve almost the same model accuracy as the learning model using the concatenation of all the raw outputs from the learning workers.
arXiv Detail & Related papers (2022-09-04T19:41:21Z) - Homogeneous Learning: Self-Attention Decentralized Deep Learning [0.6091702876917281]
We propose a decentralized learning model called Homogeneous Learning (HL) for tackling non-IID data with a self-attention mechanism.
HL can produce a better performance compared with standalone learning and greatly reduce both the total training rounds by 50.8% and the communication cost by 74.6%.
arXiv Detail & Related papers (2021-10-11T14:05:29Z) - Decentralized federated learning of deep neural networks on non-iid data [0.6335848702857039]
We tackle the non-problem of learning a personalized deep learning model in a decentralized setting.
We propose a method named Performance-Based Neighbor Selection (PENS) where clients with similar data detect each other and cooperate.
PENS is able to achieve higher accuracies as compared to strong baselines.
arXiv Detail & Related papers (2021-07-18T19:05:44Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in
Distributed Learning [16.98452728773235]
Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding data locally.
We introduce a new distributed learning scheme to address communication overhead via latent compression.
We show that downstream tasks on the compact latent representations can achieve comparable accuracy to centralized learning.
arXiv Detail & Related papers (2021-05-03T02:24:53Z) - Learning Connectivity for Data Distribution in Robot Teams [96.39864514115136]
We propose a task-agnostic, decentralized, low-latency method for data distribution in ad-hoc networks using Graph Neural Networks (GNN)
Our approach enables multi-agent algorithms based on global state information to function by ensuring it is available at each robot.
We train the distributed GNN communication policies via reinforcement learning using the average Age of Information as the reward function and show that it improves training stability compared to task-specific reward functions.
arXiv Detail & Related papers (2021-03-08T21:48:55Z) - BayGo: Joint Bayesian Learning and Information-Aware Graph Optimization [48.30183416069897]
BayGo is a novel fully decentralized joint Bayesian learning and graph optimization framework.
We show that our framework achieves faster convergence and higher accuracy compared to fully-connected and star topology graphs.
arXiv Detail & Related papers (2020-11-09T11:16:55Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z) - Decentralised Learning from Independent Multi-Domain Labels for Person
Re-Identification [69.29602103582782]
Deep learning has been successful for many computer vision tasks due to the availability of shared and centralised large-scale training data.
However, increasing awareness of privacy concerns poses new challenges to deep learning, especially for person re-identification (Re-ID)
We propose a novel paradigm called Federated Person Re-Identification (FedReID) to construct a generalisable global model (a central server) by simultaneously learning with multiple privacy-preserved local models (local clients)
This client-server collaborative learning process is iteratively performed under privacy control, enabling FedReID to realise decentralised learning without sharing distributed data nor collecting any
arXiv Detail & Related papers (2020-06-07T13:32:33Z) - Consensus Driven Learning [0.0]
We propose a new method of distributed, decentralized learning that allows a network of nodes to coordinate their training using asynchronous updates over an unreliable network.
This is achieved by taking inspiration from Distributed Averaging Consensus algorithms to coordinate the various nodes.
We show that our coordination method allows models to be learned on highly biased datasets, and in the presence of intermittent communication failure.
arXiv Detail & Related papers (2020-05-20T18:24:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.