Local Methods with Adaptivity via Scaling
- URL: http://arxiv.org/abs/2406.00846v3
- Date: Mon, 16 Sep 2024 16:30:09 GMT
- Title: Local Methods with Adaptivity via Scaling
- Authors: Savelii Chezhegov, Sergey Skorik, Nikolas Khachaturov, Danil Shalagin, Aram Avetisyan, Martin Takáč, Yaroslav Kholodov, Aleksandr Beznosikov,
- Abstract summary: This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods.
We consider the classical Local SGD method and enhance it with a scaling feature.
In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
- Score: 38.99428012275441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of machine learning and deep learning has introduced increasingly complex optimization challenges that must be addressed. Indeed, training modern, advanced models has become difficult to implement without leveraging multiple computing nodes in a distributed environment. Distributed optimization is also fundamental to emerging fields such as federated learning. Specifically, there is a need to organize the training process to minimize the time lost due to communication. A widely used and extensively researched technique to mitigate the communication bottleneck involves performing local training before communication. This approach is the focus of our paper. Concurrently, adaptive methods that incorporate scaling, notably led by Adam, have gained significant popularity in recent years. Therefore, this paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods. We consider the classical Local SGD method and enhance it with a scaling feature. A crucial aspect is that the scaling is described generically, allowing us to analyze various approaches, including Adam, RMSProp, and OASIS, in a unified manner. In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
Related papers
- Robust Decentralized Learning with Local Updates and Gradient Tracking [16.46727164965154]
We consider decentralized learning as a network of communicating clients or nodes.
We propose a decentralized minimax optimization method that employs two important data: local updates and gradient tracking.
arXiv Detail & Related papers (2024-05-02T03:03:34Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Federated Learning with a Sampling Algorithm under Isoperimetry [9.990687944474738]
Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices.
We propose a communication-efficient variant of Langevinvin's sample a posteriori.
arXiv Detail & Related papers (2022-06-02T08:19:03Z) - Toward Communication Efficient Adaptive Gradient Method [29.02154169980269]
In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks.
In the hope of training machine learning models on mobile devices, a new distributed training paradigm called federated learning'' has become popular.
We propose an adaptive gradient method that can guarantee both the convergence and the communication efficiency for federated learning.
arXiv Detail & Related papers (2021-09-10T21:14:36Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z) - Combining Federated and Active Learning for Communication-efficient
Distributed Failure Prediction in Aeronautics [0.0]
We present a new centralized distributed learning algorithm that relies on the learning paradigms of Active Learning and Federated Learning.
We evaluate this method on a public benchmark and show that its performances in terms of precision are very close to state-of-the-art performance level of non-distributed learning.
arXiv Detail & Related papers (2020-01-21T13:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.