Recycling Model Updates in Federated Learning: Are Gradient Subspaces
Low-Rank?
- URL: http://arxiv.org/abs/2202.00280v1
- Date: Tue, 1 Feb 2022 09:05:32 GMT
- Title: Recycling Model Updates in Federated Learning: Are Gradient Subspaces
Low-Rank?
- Authors: Sheikh Shams Azam, Seyyedali Hosseinalipour, Qiang Qiu, Christopher
Brinton
- Abstract summary: We propose the "Look-back Gradient Multiplier" (LBGM) algorithm, which exploits this low-rank property to enable gradient recycling.
We analytically characterize the convergence behavior of LBGM, revealing the nature of the trade-off between communication savings and model performance.
We show that LBGM is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training.
- Score: 26.055358499719027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we question the rationale behind propagating large numbers of
parameters through a distributed system during federated learning. We start by
examining the rank characteristics of the subspace spanned by gradients across
epochs (i.e., the gradient-space) in centralized model training, and observe
that this gradient-space often consists of a few leading principal components
accounting for an overwhelming majority (95-99%) of the explained variance.
Motivated by this, we propose the "Look-back Gradient Multiplier" (LBGM)
algorithm, which exploits this low-rank property to enable gradient recycling
between model update rounds of federated learning, reducing transmissions of
large parameters to single scalars for aggregation. We analytically
characterize the convergence behavior of LBGM, revealing the nature of the
trade-off between communication savings and model performance. Our subsequent
experimental results demonstrate the improvement LBGM obtains in communication
overhead compared to conventional federated learning on several datasets and
deep learning models. Additionally, we show that LBGM is a general
plug-and-play algorithm that can be used standalone or stacked on top of
existing sparsification techniques for distributed model training.
Related papers
- LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.
We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Federated Learning over Hierarchical Wireless Networks: Training Latency Minimization via Submodel Partitioning [15.311309249848739]
Hierarchical independent submodel training (HIST) is a new FL methodology that aims to address these issues in hierarchical cloud-edge-client networks.
We demonstrate how HIST can be augmented with over-the-air computation (AirComp) to further enhance the efficiency of the model aggregation over the edge cells.
arXiv Detail & Related papers (2023-10-27T04:42:59Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Exploring Heterogeneous Characteristics of Layers in ASR Models for More
Efficient Training [1.3999481573773072]
We study the stability of these layers across runs and model sizes.
We propose that group normalization may be used without disrupting their formation.
We apply these findings to Federated Learning in order to improve the training procedure.
arXiv Detail & Related papers (2021-10-08T17:25:19Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.