Hierarchical Federated Learning with Momentum Acceleration in Multi-Tier
Networks
- URL: http://arxiv.org/abs/2210.14560v1
- Date: Wed, 26 Oct 2022 08:35:37 GMT
- Title: Hierarchical Federated Learning with Momentum Acceleration in Multi-Tier
Networks
- Authors: Zhengjie Yang, Sen Fu, Wei Bao, Dong Yuan, and Albert Y. Zomaya
- Abstract summary: We propose Hierarchical Federated Learning with Momentum Acceleration (HierMo)
HierMo is a three-tier worker-edge-cloud federated learning algorithm that applies momentum for training acceleration.
- Score: 38.04641907268331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose Hierarchical Federated Learning with Momentum
Acceleration (HierMo), a three-tier worker-edge-cloud federated learning
algorithm that applies momentum for training acceleration. Momentum is
calculated and aggregated in the three tiers. We provide convergence analysis
for HierMo, showing a convergence rate of O(1/T). In the analysis, we develop a
new approach to characterize model aggregation, momentum aggregation, and their
interactions. Based on this result, {we prove that HierMo achieves a tighter
convergence upper bound compared with HierFAVG without momentum}. We also
propose HierOPT, which optimizes the aggregation periods (worker-edge and
edge-cloud aggregation periods) to minimize the loss given a limited training
time.
Related papers
- Ordered Momentum for Asynchronous SGD [12.810976838406193]
Asynchronous SGD(ASGD) and its variants are commonly used distributed learning methods.
Momentum has been acknowledged for its benefits in both optimization and generalization in deep model.
In this paper, we propose a novel assumption, called momentum (OrMo), for ASGD.
arXiv Detail & Related papers (2024-07-27T11:35:19Z) - Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach [49.97755400231656]
We show that a novel accelerated DDPM sampler achieves accelerated performance for three broad distribution classes not considered before.
Our results show an improved dependency on the data dimension $d$ among accelerated DDPM type samplers.
arXiv Detail & Related papers (2024-02-21T16:11:47Z) - Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise [16.12834917344859]
It is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings.
We show that heavy-ball momentum can provide $tildemathcalO(sqrtkappa)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate.
This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning.
arXiv Detail & Related papers (2023-12-22T09:58:39Z) - Provable Accelerated Convergence of Nesterov's Momentum for Deep ReLU
Neural Networks [12.763567932588591]
Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape.
We consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum acceleration in theory.
We provide two realizations of the problem class, one of which is deep ReLU networks, which --to the best of our knowledge-constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.
arXiv Detail & Related papers (2023-06-13T19:55:46Z) - Intensity Profile Projection: A Framework for Continuous-Time
Representation Learning for Dynamic Networks [50.2033914945157]
We present a representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data.
The framework consists of three stages: estimating pairwise intensity functions, learning a projection which minimises a notion of intensity reconstruction error.
Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses.
arXiv Detail & Related papers (2023-06-09T15:38:25Z) - Federated TD Learning over Finite-Rate Erasure Channels: Linear Speedup
under Markovian Sampling [17.870440210358847]
We study a federated policy evaluation problem where agents communicate via a central aggregator to expedite the evaluation of a common policy.
To capture typical communication constraints in FL, we consider finite capacity up-link channels that can drop packets based on a Bernoulli erasure model.
Our work is the first to provide a non-asymptotic analysis of their effects in multi-agent and federated reinforcement learning.
arXiv Detail & Related papers (2023-05-14T08:48:02Z) - AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning
Rate and Momentum for Training Deep Neural Networks [76.90477930208982]
Sharpness aware (SAM) has been extensively explored as it can generalize better for training deep neural networks.
Integrating SAM with adaptive learning perturbation and momentum acceleration, dubbed AdaSAM, has already been explored.
We conduct several experiments on several NLP tasks, which show that AdaSAM could achieve superior performance compared with SGD, AMS, and SAMsGrad.
arXiv Detail & Related papers (2023-03-01T15:12:42Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Accelerate Distributed Stochastic Descent for Nonconvex Optimization
with Momentum [12.324457683544132]
We propose a momentum method for such model averaging approaches.
We analyze the convergence and scaling properties of such momentum methods.
Our experimental results show that block momentum not only accelerates training, but also achieves better results.
arXiv Detail & Related papers (2021-10-01T19:23:18Z) - A Unified Linear Speedup Analysis of Federated Averaging and Nesterov
FedAvg [49.76940694847521]
Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data.
In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today.
We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies.
arXiv Detail & Related papers (2020-07-11T05:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.