Implicit Gradient Alignment in Distributed and Federated Learning
- URL: http://arxiv.org/abs/2106.13897v1
- Date: Fri, 25 Jun 2021 22:01:35 GMT
- Title: Implicit Gradient Alignment in Distributed and Federated Learning
- Authors: Yatin Dandi, Luis Barba, Martin Jaggi
- Abstract summary: A major obstacle to achieving global convergence in distributed and federated learning is misalignment of gradients across clients.
We propose a novel GradAlign algorithm that induces the same implicit regularization while allowing the use of arbitrarily large batches in each update.
- Score: 39.61762498388211
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major obstacle to achieving global convergence in distributed and federated
learning is the misalignment of gradients across clients, or mini-batches due
to heterogeneity and stochasticity of the distributed data. One way to
alleviate this problem is to encourage the alignment of gradients across
different clients throughout training. Our analysis reveals that this goal can
be accomplished by utilizing the right optimization method that replicates the
implicit regularization effect of SGD, leading to gradient alignment as well as
improvements in test accuracies. Since the existence of this regularization in
SGD completely relies on the sequential use of different mini-batches during
training, it is inherently absent when training with large mini-batches. To
obtain the generalization benefits of this regularization while increasing
parallelism, we propose a novel GradAlign algorithm that induces the same
implicit regularization while allowing the use of arbitrarily large batches in
each update. We experimentally validate the benefit of our algorithm in
different distributed and federated learning settings.
Related papers
- Generalizable Person Re-identification via Balancing Alignment and Uniformity [22.328800139066914]
Domain generalizable person re-identification (DG re-ID) aims to learn discriminative representations that are robust to distributional shifts.
Certain augmentations exhibit a polarized effect in this task, enhancing in-distribution performance while deteriorating out-of-distribution performance.
We propose a novel framework, Balancing Alignment and Uniformity (BAU), which effectively mitigates this effect by maintaining a balance between alignment and uniformity.
arXiv Detail & Related papers (2024-11-18T11:13:30Z) - Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information [6.767885381740953]
Federated learning has emerged as a distributed optimization paradigm.
We propose a novel modified framework wherein each client locally performs a perturbed gradient step.
We show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg.
arXiv Detail & Related papers (2024-10-07T23:14:05Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Depersonalized Federated Learning: Tackling Statistical Heterogeneity by
Alternating Stochastic Gradient Descent [6.394263208820851]
Federated learning (FL) enables devices to train a common machine learning (ML) model for intelligent inference without data sharing.
Raw data held by various cooperativelyicipators are always non-identically distributedly.
We propose a new FL that can significantly statistical optimize by the de-speed of this process.
arXiv Detail & Related papers (2022-10-07T10:30:39Z) - Improving Generalization in Reinforcement Learning with Mixture
Regularization [113.12412071717078]
We introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments.
Mixreg increases the data diversity more effectively and helps learn smoother policies.
Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
arXiv Detail & Related papers (2020-10-21T08:12:03Z) - Variance Regularization for Accelerating Stochastic Optimization [14.545770519120898]
We propose a universal principle which reduces the random error accumulation by exploiting statistic information hidden in mini-batch gradients.
This is achieved by regularizing the learning-rate according to mini-batch variances.
arXiv Detail & Related papers (2020-08-13T15:34:01Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization.
We address this problem by a new regularization method based on distributional robust optimization.
During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z) - LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient
Distributed Learning [47.93365664380274]
This paper targets solving distributed machine learning problems such as federated learning in a communication-efficient fashion.
A class of new gradient descent (SGD) approaches have been developed, which can be viewed as a generalization to the recently developed lazily aggregated gradient (LAG) method.
The key components of LASG are a set of new rules tailored for gradients that can be implemented either to save download, upload, or both.
arXiv Detail & Related papers (2020-02-26T08:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.