Related papers: Distributed Deep Learning in Open Collaborations

Distributed Deep Learning in Open Collaborations

URL: http://arxiv.org/abs/2106.10207v1
Date: Fri, 18 Jun 2021 16:23:13 GMT
Title: Distributed Deep Learning in Open Collaborations
Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko
Abstract summary: We propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost.
Score: 49.240611132653456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid- or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.

Related papers

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition [61.91764883512776]
We introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts. By doing so, TeamLoRA connects the experts as a "Team" with internal collaboration and competition, enabling a faster and more accurate PEFT paradigm for multi-task learning.
arXiv Detail & Related papers (2024-08-19T09:58:53Z)
CollaFuse: Collaborative Diffusion Models [5.331052581441263]
We introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis.
arXiv Detail & Related papers (2024-06-20T15:54:21Z)
CollaFuse: Navigating Limited Resources and Privacy in Collaborative Generative AI [5.331052581441263]
CollaFuse is a novel framework inspired by split learning. It enables shared server training and inference, alleviating client computational burdens. It has the potential to impact various application areas, such as the design of edge computing solutions, healthcare research, or autonomous driving.
arXiv Detail & Related papers (2024-02-29T12:36:10Z)
Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect. This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z)
An effective and efficient green federated learning method for one-layer neural networks [0.22499166814992436]
Federated learning (FL) is one of the most active research lines in machine learning. We present a FL method, based on a neural network without hidden layers, capable of generating a global collaborative model in a single training round. We show that the method performs equally well in both identically and non-identically distributed scenarios.
arXiv Detail & Related papers (2023-12-22T08:52:08Z)
Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner. Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server. This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Hybrid Learning for Orchestrating Deep Learning Inference in Multi-user Edge-cloud Networks [3.7630209350186807]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency. Deep Learning inference orchestration strategy employs reinforcement learning to find the optimal orchestration policy. We demonstrate efficacy of our HL strategy through experimental comparison with state-of-the-art RL-based inference orchestration.
arXiv Detail & Related papers (2022-02-21T21:50:50Z)
Privacy-Preserving Serverless Edge Learning with Decentralized Small Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models. This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.