Communication-Aware Knowledge Distillation for Federated LLM Fine-Tuning over Wireless Networks
- URL: http://arxiv.org/abs/2509.01750v2
- Date: Tue, 30 Sep 2025 18:42:50 GMT
- Title: Communication-Aware Knowledge Distillation for Federated LLM Fine-Tuning over Wireless Networks
- Authors: Xinlu Zhang, Na Yan, Yang Su, Yansha Deng, Toktam Mahmoodi,
- Abstract summary: Federated learning (FL) for large language models (LLMs) offers a privacy-preserving scheme, enabling clients to collaboratively fine-tune locally deployed LLMs or smaller language models (SLMs) without exchanging raw data.<n>While parameter-sharing methods in traditional FL models solves number of technical challenges, they still incur high communication overhead.<n>We propose Federated distillation, a framework for mutual knowledge transfer via shared logits.<n>We show that our scheme achieves superior performance compared to baseline methods while effectively reducing communication overhead by approximately 50%.
- Score: 28.49324627841803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning (FL) for large language models (LLMs) offers a privacy-preserving scheme, enabling clients to collaboratively fine-tune locally deployed LLMs or smaller language models (SLMs) without exchanging raw data. While parameter-sharing methods in traditional FL models solves number of technical challenges, they still incur high communication overhead and struggle with adapting to heterogeneous model architectures. Federated distillation, a framework for mutual knowledge transfer via shared logits, typically offers lower communication overhead than parameter-sharing methods. However, transmitting logits from LLMs remains challenging for bandwidth-limited clients due to their high dimensionality. In this work, we focus on a federated LLM distillation with efficient communication overhead. To achieve this, we first propose an adaptive Top-k logit selection mechanism, dynamically sparsifying logits according to real-time communication conditions. Then to tackle the dimensional inconsistency introduced by the adaptive sparsification, we design an adaptive logits aggregation scheme, effectively alleviating the artificial and uninformative inputs introduced by conventional zero-padding methods. Finally, to enhance the distillation effect, we incorporate LoRA-adapted hidden-layer projection from LLM into the distillation loss, reducing the communication overhead further while providing richer representation. Experimental results demonstrate that our scheme achieves superior performance compared to baseline methods while effectively reducing communication overhead by approximately 50%.
Related papers
- Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning [0.8220217498103312]
Federated Learning (FL) enables decentralized model training across multiple clients while preserving data privacy.<n>We introduce two algorithms: ProjFL, designed for unbiased compressors, and ProjFL+EF, for biased compressors through an Error Feedback mechanism.
arXiv Detail & Related papers (2025-11-05T13:11:30Z) - Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission [87.68447072141402]
Hybrid Language Models (HLMs) combine the low-latency efficiency of Small Language Models (SLMs) on edge devices with the high accuracy of Large Language Models (LLMs) on centralized servers.<n>We propose FedHLM, a communication-efficient HLM framework that integrates uncertainty-aware inference with Federated Learning (FL)
arXiv Detail & Related papers (2025-06-30T02:56:11Z) - Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models [13.742950928229078]
Low-Rank Adaptation (LoRA) addresses these issues by training compact, low-rank matrices instead of fully fine-tuning large models.<n>This paper introduces a wireless federated LoRA fine-tuning framework that optimize both learning performance and communication efficiency.
arXiv Detail & Related papers (2025-05-01T06:15:38Z) - Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation [47.82423317739088]
This paper introduces communication-efficient federated LoRA adaption (CE-LoRA), a method that employs a tri-factorization low-rank adaptation approach with personalized model parameter aggregation.<n>Experiments on various LLM and VLM fine-tuning tasks demonstrate that CE-LoRA not only significantly reduces communication overhead but also improves performance under not independently and identically distributed data conditions.
arXiv Detail & Related papers (2025-03-31T09:18:42Z) - Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models.<n>Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large language models (LLMs) have driven profound transformations in wireless networks.<n>Within wireless environments, the training of LLMs faces significant challenges related to security and privacy.<n>This paper presents a systematic analysis of the training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning.
arXiv Detail & Related papers (2024-04-20T02:30:21Z) - Asynchronous Online Federated Learning with Reduced Communication
Requirements [6.282767337715445]
We propose a communication-efficient asynchronous online federated learning (PAO-Fed) strategy.
By reducing the communication overhead of the participants, the proposed method renders participation in the learning task more accessible and efficient.
We conduct comprehensive simulations to study the performance of the proposed method on both synthetic and real-life datasets.
arXiv Detail & Related papers (2023-03-27T14:06:05Z) - HiFlash: Communication-Efficient Hierarchical Federated Learning with
Adaptive Staleness Control and Heterogeneity-aware Client-Edge Association [38.99309610943313]
Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients.
For many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN)
We resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing.
arXiv Detail & Related papers (2023-01-16T14:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.