ProgFed: Effective, Communication, and Computation Efficient Federated
Learning by Progressive Training
- URL: http://arxiv.org/abs/2110.05323v1
- Date: Mon, 11 Oct 2021 14:45:00 GMT
- Title: ProgFed: Effective, Communication, and Computation Efficient Federated
Learning by Progressive Training
- Authors: Hui-Po Wang, Sebastian U. Stich, Yang He, Mario Fritz
- Abstract summary: We propose ProgFed, a progressive training framework for efficient and effective federated learning.
It inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
- Score: 78.44473677588887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning is a powerful distributed learning scheme that allows
numerous edge devices to collaboratively train a model without sharing their
data. However, training is resource-intensive for edge devices, and limited
network bandwidth is often the main bottleneck. Prior work often overcomes the
constraints by condensing the models or messages into compact formats, e.g., by
gradient compression or distillation. In contrast, we propose ProgFed, the
first progressive training framework for efficient and effective federated
learning. It inherently reduces computation and two-way communication costs
while maintaining the strong performance of the final models. We theoretically
prove that ProgFed converges at the same asymptotic rate as standard training
on full models. Extensive results on a broad range of architectures, including
CNNs (VGG, ResNet, ConvNets) and U-nets, and diverse tasks from simple
classification to medical image segmentation show that our highly effective
training approach saves up to $20\%$ computation and up to $63\%$ communication
costs for converged models. As our approach is also complimentary to prior work
on compression, we can achieve a wide range of trade-offs, showing reduced
communication of up to $50\times$ at only $0.1\%$ loss in utility.
Related papers
- FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [35.40065954148091]
FINE is a method based on the Learngene framework to initializing downstream networks leveraging pre-trained models.
It decomposes pre-trained knowledge into the product of matrices (i.e., $U$, $Sigma$, and $V$), where $U$ and $V$ are shared across network blocks as learngenes''
It consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes.
arXiv Detail & Related papers (2024-09-28T08:57:17Z) - DεpS: Delayed ε-Shrinking for Faster Once-For-All Training [8.199430861588919]
CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices.
Once-for-all training has emerged as a scalable approach that jointly co-trains many models (subnets) at once with a constant training cost.
We propose Delayed $epsilon$-Shrinking (D$epsilon$pS) that starts the process of shrinking the full model when it is partially trained.
arXiv Detail & Related papers (2024-07-08T17:45:40Z) - Federated Hyperdimensional Computing [14.844383542052169]
Federated learning (FL) enables a loose set of participating clients to collaboratively learn a global model via coordination by a central server.
Existing FL approaches rely on complex algorithms with massive models, such as deep neural networks (DNNs)
We first propose FedHDC, a federated learning framework based on hyperdimensional computing (HDC)
arXiv Detail & Related papers (2023-12-26T09:24:19Z) - Towards Federated Learning Under Resource Constraints via Layer-wise
Training and Depth Dropout [33.308067180286045]
Federated learning can be difficult to scale to large models when clients have limited resources.
We introduce Federated Layer-wise Learning to simultaneously reduce per-client memory, computation, and communication costs.
We also introduce Federated Depth Dropout, a complementary technique that randomly drops frozen layers during training, to further reduce resource usage.
arXiv Detail & Related papers (2023-09-11T03:17:45Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Federated Progressive Sparsification (Purge, Merge, Tune)+ [15.08232397899507]
FedSparsify is a sparsification strategy based on progressive weight magnitude pruning.
We show experimentally that FedSparsify learns a subnetwork of both high sparsity and learning performance.
arXiv Detail & Related papers (2022-04-26T16:45:53Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - Training Recommender Systems at Scale: Communication-Efficient Model and
Data Parallelism [56.78673028601739]
We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training.
DCT reduces communication by at least $100times$ and $20times$ during DP and MP, respectively.
It improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.
arXiv Detail & Related papers (2020-10-18T01:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.