Related papers: CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models

CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models

URL: http://arxiv.org/abs/2405.13746v2
Date: Fri, 24 May 2024 03:17:41 GMT
Title: CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models
Authors: Huiwen Wu, Xiaohan Li, Deyi Zhang, Xiaogang Xu, Jiafei Wu, Puning Zhao, Zhe Liu,
Abstract summary: This study introduces an innovative approach to compress gradients to improve communication efficiency during Large-Language Models (LLMs) We also present a series of experimental analyses focusing on the signal-to-noise ratio, compression rate, and robustness within this privacy-centric framework.
Score: 21.919883617413358
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The success of current Large-Language Models (LLMs) hinges on extensive training data that is collected and stored centrally, called Centralized Learning (CL). However, such a collection manner poses a privacy threat, and one potential solution is Federated Learning (FL), which transfers gradients, not raw data, among clients. Unlike traditional networks, FL for LLMs incurs significant communication costs due to their tremendous parameters. This study introduces an innovative approach to compress gradients to improve communication efficiency during LLM FL, formulating the new FL pipeline named CG-FedLLM. This approach integrates an encoder on the client side to acquire the compressed gradient features and a decoder on the server side to reconstruct the gradients. We also developed a novel training strategy that comprises Temporal-ensemble Gradient-Aware Pre-training (TGAP) to identify characteristic gradients of the target model and Federated AutoEncoder-Involved Fine-tuning (FAF) to compress gradients adaptively. Extensive experiments confirm that our approach reduces communication costs and improves performance (e.g., average 3 points increment compared with traditional CL- and FL-based fine-tuning with LlaMA on a well-recognized benchmark, C-Eval). This improvement is because our encoder-decoder, trained via TGAP and FAF, can filter gradients while selectively preserving critical features. Furthermore, we present a series of experimental analyses focusing on the signal-to-noise ratio, compression rate, and robustness within this privacy-centric framework, providing insight into developing more efficient and secure LLMs.

Related papers

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning [55.03441672267886]
We propose GradAlign, a gradient-aligned data selection method for reinforcement learning.<n>We evaluate GradAlign across three data regimes: unreliable reward signals, distribution imbalance, and low-utility training corpus.
arXiv Detail & Related papers (2026-02-25T01:54:50Z)
Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs [52.15785423211181]
Grad-ELLM is a gradient-based attribution method for decoder-only transformer-based Large Language Models.<n>We introduce two faithfulneses metrics $$-Soft-NC and $$-Soft-NS, which provide fairer comparisons.<n>Experiment results show that Grad-ELLM consistently achieves superior faithfulness than other attribution methods.
arXiv Detail & Related papers (2026-01-06T15:22:39Z)
Timely Parameter Updating in Over-the-Air Federated Learning [45.5660377179285]
We propose Freshness Freshness-mAgnItude awaRe top-k (FAIR-k), an algorithm that selects, in each communication round, the most impactful subset of gradients to be updated over the air.<n>We show that FAIR-k promotes fresh (and fair) parameter updates and enhances communication efficiency by enabling an extended period of local training without significantly affecting overall training efficiency.
arXiv Detail & Related papers (2025-12-22T07:18:13Z)
Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning [0.8220217498103312]
Federated Learning (FL) enables decentralized model training across multiple clients while preserving data privacy.<n>We introduce two algorithms: ProjFL, designed for unbiased compressors, and ProjFL+EF, for biased compressors through an Error Feedback mechanism.
arXiv Detail & Related papers (2025-11-05T13:11:30Z)
Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models. Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z)
On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients [2.2530496464901106]
The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data. We propose a novel replay-memory based federated strategy consisting of edge-based gradient updates on memory and aggregated gradients on the current data. We empirically show that C-FLAG outperforms several state-of-the-art baselines on both task and class-incremental settings with respect to metrics such as accuracy and forgetting.
arXiv Detail & Related papers (2024-11-12T17:36:20Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting. We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation [33.56146654796337]
A coded federated learning framework has been proposed to mitigate the negative impact of stragglers. We propose a new method named adaptive coded federated learning (ACFL) to overcome this drawback.
arXiv Detail & Related papers (2024-03-22T01:51:48Z)
FedImpro: Measuring and Improving Client Update in Federated Learning [77.68805026788836]
Federated Learning (FL) models often experience client drift caused by heterogeneous data. We present an alternative perspective on client drift and aim to mitigate it by generating improved local models.
arXiv Detail & Related papers (2024-02-10T18:14:57Z)
Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds. We show strong evidences that variable-length is beneficial for compression in FL. We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z)
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy. Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
Adaptive Control of Client Selection and Gradient Compression for Efficient Federated Learning [28.185096784982544]
Federated learning (FL) allows multiple clients cooperatively train models without disclosing local data. We propose a heterogeneous-aware FL framework, called FedCG, with adaptive client selection and gradient compression. Experiments on both real-world prototypes and simulations show that FedCG can provide up to 5.3$times$ speedup compared to other methods.
arXiv Detail & Related papers (2022-12-19T14:19:07Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Boosting Resource-Constrained Federated Learning Systems with Guessed Updates [1.6053176639259055]
GEL enables constrained edge devices to perform additional learning through guessed updates on top of gradient-based steps.<n>GEL can boost empirical convergence by up to 40% in resource constrained networks.
arXiv Detail & Related papers (2021-10-21T21:23:04Z)
CosSGD: Nonlinear Quantization for Communication-efficient Federated Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning. Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.