Toward Communication Efficient Adaptive Gradient Method
- URL: http://arxiv.org/abs/2109.05109v1
- Date: Fri, 10 Sep 2021 21:14:36 GMT
- Title: Toward Communication Efficient Adaptive Gradient Method
- Authors: Xiangyi Chen, Xiaoyun Li, Ping Li
- Abstract summary: In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks.
In the hope of training machine learning models on mobile devices, a new distributed training paradigm called federated learning'' has become popular.
We propose an adaptive gradient method that can guarantee both the convergence and the communication efficiency for federated learning.
- Score: 29.02154169980269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, distributed optimization is proven to be an effective
approach to accelerate training of large scale machine learning models such as
deep neural networks. With the increasing computation power of GPUs, the
bottleneck of training speed in distributed training is gradually shifting from
computation to communication. Meanwhile, in the hope of training machine
learning models on mobile devices, a new distributed training paradigm called
``federated learning'' has become popular. The communication time in federated
learning is especially important due to the low bandwidth of mobile devices.
While various approaches to improve the communication efficiency have been
proposed for federated learning, most of them are designed with SGD as the
prototype training algorithm. While adaptive gradient methods have been proven
effective for training neural nets, the study of adaptive gradient methods in
federated learning is scarce. In this paper, we propose an adaptive gradient
method that can guarantee both the convergence and the communication efficiency
for federated learning.
Related papers
- Local Methods with Adaptivity via Scaling [38.99428012275441]
This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods.
We consider the classical Local SGD method and enhance it with a scaling feature.
In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
arXiv Detail & Related papers (2024-06-02T19:50:05Z) - Adaptive Compression-Aware Split Learning and Inference for Enhanced
Network Efficiency [8.863196307297692]
We develop an adaptive compression-aware split learning method ('deprune') to improve and train deep learning models.
We show that the 'deprune' method can reduce network usage by 4x when compared with a split-learning approach.
We also show that the 'prune' method can reduce the training time for certain models by up to 6x without affecting the accuracy.
arXiv Detail & Related papers (2023-11-09T20:52:36Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - Friendly Training: Neural Networks Can Adapt Data To Make Learning
Easier [23.886422706697882]
We propose a novel training procedure named Friendly Training.
We show that Friendly Training yields improvements with respect to informed data sub-selection and random selection.
Results suggest that adapting the input data is a feasible way to stabilize learning and improve the skills generalization of the network.
arXiv Detail & Related papers (2021-06-21T10:50:34Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - A Hybrid Method for Training Convolutional Neural Networks [3.172761915061083]
We propose a hybrid method that uses both backpropagation and evolutionary strategies to train Convolutional Neural Networks.
We show that the proposed hybrid method is capable of improving upon regular training in the task of image classification.
arXiv Detail & Related papers (2020-04-15T17:52:48Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.