Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity
- URL: http://arxiv.org/abs/2302.09634v1
- Date: Sun, 19 Feb 2023 17:42:35 GMT
- Title: Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity
- Authors: Richeng Jin, Xiaofan He, Caijun Zhong, Zhaoyang Zhang, Tony Quek,
Huaiyu Dai
- Abstract summary: Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
- Score: 60.791736094073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication overhead has become one of the major bottlenecks in the
distributed training of deep neural networks. To alleviate the concern, various
gradient compression methods have been proposed, and sign-based algorithms are
of surging interest. However, SIGNSGD fails to converge in the presence of data
heterogeneity, which is commonly observed in the emerging federated learning
(FL) paradigm. Error feedback has been proposed to address the non-convergence
issue. Nonetheless, it requires the workers to locally keep track of the
compression errors, which renders it not suitable for FL since the workers may
not participate in the training throughout the learning process. In this paper,
we propose a magnitude-driven sparsification scheme, which addresses the
non-convergence issue of SIGNSGD while further improving communication
efficiency. Moreover, the local update scheme is further incorporated to
improve the learning performance, and the convergence of the proposed method is
established. The effectiveness of the proposed scheme is validated through
experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
Related papers
- Smart Information Exchange for Unsupervised Federated Learning via
Reinforcement Learning [11.819765040106185]
We propose an approach to create an optimal graph for data transfer using Reinforcement Learning.
The goal is to form links that will provide the most benefit considering the environment's constraints.
Numerical analysis shows the advantages in terms of convergence speed and straggler resilience of the proposed method.
arXiv Detail & Related papers (2024-02-15T00:14:41Z) - Sparse Training for Federated Learning with Regularized Error Correction [9.852567834643292]
Federated Learning (FL) has attracted much interest due to the significant advantages it brings to training deep neural network (DNN) models.
FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process.
The performance of FLARE is validated through extensive experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy.
arXiv Detail & Related papers (2023-12-21T12:36:53Z) - Communication Efficient and Privacy-Preserving Federated Learning Based
on Evolution Strategies [0.0]
Federated learning (FL) is an emerging paradigm for training deep neural networks (DNNs) in distributed manners.
In this work, we present a federated learning algorithm based on evolution strategies (FedES), a zeroth-order training method.
arXiv Detail & Related papers (2023-11-05T21:40:46Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - FedAgg: Adaptive Federated Learning with Aggregated Gradients [1.5653612447564105]
We propose an adaptive FEDerated learning algorithm called FedAgg to alleviate the divergence between the local and average model parameters and obtain a fast model convergence rate.
We show that our framework is superior to existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID datasets.
arXiv Detail & Related papers (2023-03-28T08:07:28Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Communication-Efficient Hierarchical Federated Learning for IoT
Heterogeneous Systems with Imbalanced Data [42.26599494940002]
Federated learning (FL) is a distributed learning methodology that allows multiple nodes to cooperatively train a deep learning model.
This paper studies the potential of hierarchical FL in IoT heterogeneous systems.
It proposes an optimized solution for user assignment and resource allocation on multiple edge nodes.
arXiv Detail & Related papers (2021-07-14T08:32:39Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees [49.91477656517431]
Quantization-based solvers have been widely adopted in Federated Learning (FL)
No existing methods enjoy all the aforementioned properties.
We propose an intuitively-simple yet theoretically-simple method based on SIGNSGD to bridge the gap.
arXiv Detail & Related papers (2020-02-25T15:12:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.