Recycling Model Updates in Federated Learning: Are Gradient Subspaces
Low-Rank?
- URL: http://arxiv.org/abs/2202.00280v1
- Date: Tue, 1 Feb 2022 09:05:32 GMT
- Title: Recycling Model Updates in Federated Learning: Are Gradient Subspaces
Low-Rank?
- Authors: Sheikh Shams Azam, Seyyedali Hosseinalipour, Qiang Qiu, Christopher
Brinton
- Abstract summary: We propose the "Look-back Gradient Multiplier" (LBGM) algorithm, which exploits this low-rank property to enable gradient recycling.
We analytically characterize the convergence behavior of LBGM, revealing the nature of the trade-off between communication savings and model performance.
We show that LBGM is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training.
- Score: 26.055358499719027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we question the rationale behind propagating large numbers of
parameters through a distributed system during federated learning. We start by
examining the rank characteristics of the subspace spanned by gradients across
epochs (i.e., the gradient-space) in centralized model training, and observe
that this gradient-space often consists of a few leading principal components
accounting for an overwhelming majority (95-99%) of the explained variance.
Motivated by this, we propose the "Look-back Gradient Multiplier" (LBGM)
algorithm, which exploits this low-rank property to enable gradient recycling
between model update rounds of federated learning, reducing transmissions of
large parameters to single scalars for aggregation. We analytically
characterize the convergence behavior of LBGM, revealing the nature of the
trade-off between communication savings and model performance. Our subsequent
experimental results demonstrate the improvement LBGM obtains in communication
overhead compared to conventional federated learning on several datasets and
deep learning models. Additionally, we show that LBGM is a general
plug-and-play algorithm that can be used standalone or stacked on top of
existing sparsification techniques for distributed model training.
Related papers
- Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Submodel Partitioning in Hierarchical Federated Learning: Algorithm
Design and Convergence Analysis [15.311309249848739]
Hierarchical learning (FL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL)
In this paper, we propose independent sub training overconstrained Internet of Things (IoT)
Key idea behind HIST is a global version of model computation, where we partition the global model into disjoint submodels in each round, and distribute them across different cells.
arXiv Detail & Related papers (2023-10-27T04:42:59Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Exploring Heterogeneous Characteristics of Layers in ASR Models for More
Efficient Training [1.3999481573773072]
We study the stability of these layers across runs and model sizes.
We propose that group normalization may be used without disrupting their formation.
We apply these findings to Federated Learning in order to improve the training procedure.
arXiv Detail & Related papers (2021-10-08T17:25:19Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.