Related papers: Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA

Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA

URL: http://arxiv.org/abs/2602.20492v1
Date: Tue, 24 Feb 2026 02:45:32 GMT
Title: Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA
Authors: Nuocheng Yang, Sihua Wang, Ouwen Huan, Mingzhe Chen, Tony Q. S. Quek, Changchuan Yin,
Abstract summary: Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via wireless connections for knowledge integration.<n> directly aggregating parameters fine-tuned on heterogeneous datasets induces three primary issues across the DFL life-cycle: (i) catastrophic knowledge forgetting during fine-tuning process, arising from conflicting update directions caused by data heterogeneity; (ii) textitinefficient communication and convergence during model aggregation process,
Score: 61.12136997430116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via wireless connections for knowledge integration.However, directly aggregating parameters fine-tuned on heterogeneous datasets induces three primary issues across the DFL life-cycle: (i) \textit{catastrophic knowledge forgetting during fine-tuning process}, arising from conflicting update directions caused by data heterogeneity; (ii) \textit{inefficient communication and convergence during model aggregation process}, due to bandwidth-intensive redundant model transmissions; and (iii) \textit{multi-task knowledge interference during inference process}, resulting from incompatible knowledge representations coexistence during inference. To address these issues in a fully decentralized scenario, we first propose a sparse-and-orthogonal LoRA that ensures orthogonality between model updates to eliminate direction conflicts during fine-tuning.Then, we analyze how device connection topology affects multi-task performance, prompting a cluster-based topology design during aggregation.Finally, we propose an implicit mixture of experts (MoE) mechanism to avoid the coexistence of incompatible knowledge during inference. Simulation results demonstrate that the proposed approach effectively reduces communication resource consumption by up to $73\%$ and enhances average performance by $5\%$ compared with the traditional LoRA method.

Related papers

Joint Optimization of Model Partitioning and Resource Allocation for Anti-Jamming Collaborative Inference Systems [52.842088497389746]
This letter focuses on an anti-jamming collaborative inference system in the presence of a malicious jammer.<n>We first analyze the effects of jamming and DNN partitioning on inference accuracy via data regression.<n>We propose an efficient alternating optimization-based algorithm, which decomposes the problem into three subproblems.
arXiv Detail & Related papers (2026-03-03T03:52:52Z)
Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models [53.79667447811139]
We show that a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal.<n>These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader estimation.
arXiv Detail & Related papers (2026-02-04T15:25:02Z)
Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA [20.00589625873043]
textttTAD-LoRA is a serverless variant of federated learning.<n>We show that textttTAD-LoRA is competitive in strongly connected topologies and delivers clear gains under moderately and weakly connected topologies.
arXiv Detail & Related papers (2026-01-31T01:57:53Z)
Backscatter Device-aided Integrated Sensing and Communication: A Pareto Optimization Framework [59.30060797118097]
Integrated sensing and communication (ISAC) systems potentially encounter significant performance degradation in densely obstructed urban non-line-of-sight scenarios.<n>This paper proposes a backscatter approximation (BD)-assisted ISAC system, which leverages passive BDs naturally distributed in environments of enhancement.
arXiv Detail & Related papers (2025-07-12T17:11:06Z)
Adaptive Federated LoRA in Heterogeneous Wireless Networks with Independent Sampling [15.218221234361922]
Federated LoRA has emerged as a technique for efficiently fine-tuning large language models on distributed devices.<n>In this paper, we propose independent federated convergence wall-clock time of fine-tuning under both system and data heterogeneity.<n>Experiments demonstrate that our approach reduces wall-clock time compared to state-of-the-art methods across various models and datasets.
arXiv Detail & Related papers (2025-05-29T15:31:37Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
Scalable Hierarchical Over-the-Air Federated Learning [3.8798345704175534]
This work introduces a new two-level learning method designed to handle both interference and device data heterogeneity. We present a comprehensive mathematical approach to derive the convergence bound for the proposed algorithm. Despite the interference and data heterogeneity, the proposed algorithm achieves high learning accuracy for a variety of parameters.
arXiv Detail & Related papers (2022-11-29T12:46:37Z)
Time-Correlated Sparsification for Efficient Over-the-Air Model Aggregation in Wireless Federated Learning [23.05003652536773]
Federated edge learning (FEEL) is a promising distributed machine learning (ML) framework to drive edge intelligence applications. We propose time-correlated sparsification with hybrid aggregation (TCS-H) for communication-efficient FEEL.
arXiv Detail & Related papers (2022-02-17T02:48:07Z)
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices. We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z)
Over-the-Air Multi-Task Federated Learning Over MIMO Interference Channel [17.362158131772127]
We study over-the-air multi-task FL (OA-MTFL) over the multiple-input multiple-output (MIMO) interference channel. We propose a novel model aggregation method for the alignment of local gradients for different devices. We show that due to the use of the new model aggregation method, device selection is no longer essential to our scheme.
arXiv Detail & Related papers (2021-12-27T10:42:04Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.