Distributed Deep Learning in Open Collaborations
- URL: http://arxiv.org/abs/2106.10207v1
- Date: Fri, 18 Jun 2021 16:23:13 GMT
- Title: Distributed Deep Learning in Open Collaborations
- Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier,
Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin,
Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev,
Yacine Jernite, Thomas Wolf, Gennady Pekhimenko
- Abstract summary: We propose a novel algorithmic framework designed specifically for collaborative training.
We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost.
- Score: 49.240611132653456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep learning applications require increasingly more compute to train
state-of-the-art models. To address this demand, large corporations and
institutions use dedicated High-Performance Computing clusters, whose
construction and maintenance are both environmentally costly and well beyond
the budget of most organizations. As a result, some research directions become
the exclusive domain of a few large industrial and even fewer academic actors.
To alleviate this disparity, smaller groups may pool their computational
resources and run collaborative experiments that benefit all participants. This
paradigm, known as grid- or volunteer computing, has seen successful
applications in numerous scientific areas. However, using this approach for
machine learning is difficult due to high latency, asymmetric bandwidth, and
several challenges unique to volunteer computing. In this work, we carefully
analyze these constraints and propose a novel algorithmic framework designed
specifically for collaborative training. We demonstrate the effectiveness of
our approach for SwAV and ALBERT pretraining in realistic conditions and
achieve performance comparable to traditional setups at a fraction of the cost.
Finally, we provide a detailed report of successful collaborative language
model pretraining with 40 participants.
Related papers
- TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition [61.91764883512776]
We introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts.
By doing so, TeamLoRA connects the experts as a "Team" with internal collaboration and competition, enabling a faster and more accurate PEFT paradigm for multi-task learning.
arXiv Detail & Related papers (2024-08-19T09:58:53Z) - CollaFuse: Collaborative Diffusion Models [5.331052581441263]
We introduce a novel approach for distributed collaborative diffusion models inspired by split learning.
Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis.
arXiv Detail & Related papers (2024-06-20T15:54:21Z) - CollaFuse: Navigating Limited Resources and Privacy in Collaborative Generative AI [5.331052581441263]
CollaFuse is a novel framework inspired by split learning.
It enables shared server training and inference, alleviating client computational burdens.
It has the potential to impact various application areas, such as the design of edge computing solutions, healthcare research, or autonomous driving.
arXiv Detail & Related papers (2024-02-29T12:36:10Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - An effective and efficient green federated learning method for one-layer
neural networks [0.22499166814992436]
Federated learning (FL) is one of the most active research lines in machine learning.
We present a FL method, based on a neural network without hidden layers, capable of generating a global collaborative model in a single training round.
We show that the method performs equally well in both identically and non-identically distributed scenarios.
arXiv Detail & Related papers (2023-12-22T08:52:08Z) - Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner.
Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server.
This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Hybrid Learning for Orchestrating Deep Learning Inference in Multi-user
Edge-cloud Networks [3.7630209350186807]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency.
Deep Learning inference orchestration strategy employs reinforcement learning to find the optimal orchestration policy.
We demonstrate efficacy of our HL strategy through experimental comparison with state-of-the-art RL-based inference orchestration.
arXiv Detail & Related papers (2022-02-21T21:50:50Z) - Privacy-Preserving Serverless Edge Learning with Decentralized Small
Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models.
This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.