GRAWA: Gradient-based Weighted Averaging for Distributed Training of
Deep Learning Models
- URL: http://arxiv.org/abs/2403.04206v1
- Date: Thu, 7 Mar 2024 04:22:34 GMT
- Title: GRAWA: Gradient-based Weighted Averaging for Distributed Training of
Deep Learning Models
- Authors: Tolga Dimlioglu, Anna Choromanska
- Abstract summary: We study distributed training of deep models in time-constrained environments.
We propose a new algorithm that periodically pulls workers towards the center variable computed as an average of workers.
- Score: 9.377424534371727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study distributed training of deep learning models in time-constrained
environments. We propose a new algorithm that periodically pulls workers
towards the center variable computed as a weighted average of workers, where
the weights are inversely proportional to the gradient norms of the workers
such that recovering the flat regions in the optimization landscape is
prioritized. We develop two asynchronous variants of the proposed algorithm
that we call Model-level and Layer-level Gradient-based Weighted Averaging
(resp. MGRAWA and LGRAWA), which differ in terms of the weighting scheme that
is either done with respect to the entire model or is applied layer-wise. On
the theoretical front, we prove the convergence guarantee for the proposed
approach in both convex and non-convex settings. We then experimentally
demonstrate that our algorithms outperform the competitor methods by achieving
faster convergence and recovering better quality and flatter local optima. We
also carry out an ablation study to analyze the scalability of the proposed
algorithms in more crowded distributed training environments. Finally, we
report that our approach requires less frequent communication and fewer
distributed updates compared to the state-of-the-art baselines.
Related papers
- Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices [18.77845142335398]
We investigate the problem of mitigating the straggler effect in hierarchical distributed learning systems with an additional layer composed of edge nodes.
We propose a hierarchical gradient coding framework, which provides better stragglers mitigation.
We develop an efficient algorithm to mathematically solve the problem by outputting the optimum strategy.
arXiv Detail & Related papers (2024-06-16T07:52:12Z) - Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts.
We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning [8.366415386275557]
Solution involves a reformation of the objective function for optimization in neural network models.
We introduce a decentralized weighted aggregating scheme based on the performance of local workers.
To validate the new method, we benchmark our schemes against several popular algorithms.
arXiv Detail & Related papers (2020-04-07T23:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.