Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission
- URL: http://arxiv.org/abs/2507.00082v1
- Date: Mon, 30 Jun 2025 02:56:11 GMT
- Title: Federated Learning-Enabled Hybrid Language Models for Communication-Efficient Token Transmission
- Authors: Faranaksadat Solat, Joohyung Lee, Mohamed Seif, Dusit Niyato, H. Vincent Poor,
- Abstract summary: Hybrid Language Models (HLMs) combine the low-latency efficiency of Small Language Models (SLMs) on edge devices with the high accuracy of Large Language Models (LLMs) on centralized servers.<n>We propose FedHLM, a communication-efficient HLM framework that integrates uncertainty-aware inference with Federated Learning (FL)
- Score: 87.68447072141402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hybrid Language Models (HLMs) combine the low-latency efficiency of Small Language Models (SLMs) on edge devices with the high accuracy of Large Language Models (LLMs) on centralized servers. Unlike traditional end-to-end LLM inference, HLMs reduce latency and communication by invoking LLMs only when local SLM predictions are uncertain, i.e., when token-level confidence is low or entropy is high. However, ambiguous or low-confidence predictions still require frequent offloading to the LLM, leading to significant communication overhead in bandwidth-constrained settings. To address this, we propose FedHLM, a communication-efficient HLM framework that integrates uncertainty-aware inference with Federated Learning (FL). FedHLM's key innovation lies in collaboratively learning token-level uncertainty thresholds that govern when LLM assistance is needed. Rather than using static or manually tuned thresholds, FedHLM employs FL to optimize these thresholds in a privacy-preserving, distributed manner. Additionally, it leverages embedding-based token representations for Peer-to-Peer (P2P) resolution, enabling clients to reuse tokens inferred by semantically similar peers without engaging the LLM. We further introduce hierarchical model aggregation: edge servers refine local routing policies through client updates, while cross-cluster coordination aligns global decision boundaries. This layered design captures recurring uncertainty patterns, reducing redundant LLM queries. Experiments on large-scale news classification tasks show that FedHLM reduces LLM transmissions by over 95 percent with negligible accuracy loss, making it well-suited for scalable and efficient edge-AI applications.
Related papers
- FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model [0.48342038441006796]
Federated Learning (FL) offers a decentralized framework for training and fine-tuning Large Language Models (LLMs)<n>FL addresses privacy and security concerns while navigating challenges associated with the substantial computational demands of LLMs.<n>We propose a novel method, FedShield-LLM, that uses pruning with Fully Homomorphic Encryption (FHE) for Low-Rank Adaptation (LoRA) parameters.
arXiv Detail & Related papers (2025-06-06T00:05:05Z) - Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission [65.17811759381978]
Hybrid language model (HLM) generates draft tokens that are validated and corrected by a remote large language model (LLM)<n>We propose communication-efficient and uncertainty-aware HLM (CU-HLM)<n>We show that CU-HLM achieves up to 206$times$ higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy.
arXiv Detail & Related papers (2025-05-17T02:10:34Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - Uncertainty-Aware Hybrid Inference with On-Device Small and Remote Large Language Models [49.48313161005423]
A hybrid language model (HLM) architecture integrates a small language model (SLM) operating on a mobile device with a large language model (LLM) hosted at the base station (BS) of a wireless network.<n>The HLM token generation process follows the speculative inference principle: the SLM's vocabulary distribution is uploaded to the LLM, which either accepts or rejects it, with rejected tokens being resampled by the LLM.<n>We propose a novel HLM structure coined Uncertainty-aware opportunistic HLM (U-HLM), wherein the SLM locally measures its output uncertainty and skips both up
arXiv Detail & Related papers (2024-12-17T09:08:18Z) - FedDTPT: Federated Discrete and Transferable Prompt Tuning for Black-Box Large Language Models [14.719919025265224]
Fine-tuning large language models (LLMs) with data from specific scenarios poses privacy leakage risks.
We propose for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models.
Our approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting.
arXiv Detail & Related papers (2024-11-01T19:19:23Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)
A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.
This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding.
A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models [28.284346666217207]
FedMKT is a parameter-efficient mutual knowledge transfer framework for large and small language models.<n>We show that FedMKT simultaneously boosts the performance of both LLMs and SLMs.
arXiv Detail & Related papers (2024-06-04T11:36:09Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.