Federated Learning Meets LLMs: Feature Extraction From Heterogeneous Clients
- URL: http://arxiv.org/abs/2510.00065v1
- Date: Mon, 29 Sep 2025 14:06:52 GMT
- Title: Federated Learning Meets LLMs: Feature Extraction From Heterogeneous Clients
- Authors: Abdelrhman Gaber, Hassan Abd-Eltawab, Youssif Abuzied, Muhammad ElMahdy, Tamer ElBatt,
- Abstract summary: Federated learning (FL) enables collaborative model training without sharing raw data.<n>We propose FedLLM-Align, a framework that leverages pre-trained large language models (LLMs) as universal feature extractors.<n>We evaluate FedLLM-Align on coronary heart disease prediction using partitioned datasets with simulated schema divergence.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) enables collaborative model training without sharing raw data, making it attractive for privacy-sensitive domains such as healthcare, finance, and IoT. A major obstacle, however, is the heterogeneity of tabular data across clients, where divergent schemas and incompatible feature spaces prevent straightforward aggregation. To address this challenge, we propose FedLLM-Align, a federated framework that leverages pre-trained large language models (LLMs) as universal feature extractors. Tabular records are serialized into text, and embeddings from models such as DistilBERT, ALBERT, RoBERTa, and ClinicalBERT provide semantically aligned representations that support lightweight local classifiers under the standard FedAvg protocol. This approach removes the need for manual schema harmonization while preserving privacy, since raw data remain strictly local. We evaluate FedLLM-Align on coronary heart disease prediction using partitioned Framingham datasets with simulated schema divergence. Across all client settings and LLM backbones, our method consistently outperforms state-of-the-art baselines, achieving up to +0.25 improvement in F1-score and a 65% reduction in communication cost. Stress testing under extreme schema divergence further demonstrates graceful degradation, unlike traditional methods that collapse entirely. These results establish FedLLM-Align as a robust, privacy-preserving, and communication-efficient solution for federated learning in heterogeneous environments.
Related papers
- Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z) - Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models [63.70401095689976]
We argue that replacing parameters with preferences represents a more scalable and privacy-preserving future.<n>We propose MoR, a federated alignment framework based on GRPO with Mixture-of-Rewards for heterogeneous VLMs.<n>MoR consistently outperforms federated alignment baselines in generalization, robustness, and cross-client adaptability.
arXiv Detail & Related papers (2026-01-31T03:11:51Z) - Rethinking Federated Graph Foundation Models: A Graph-Language Alignment-based Approach [8.517604507672262]
Recent studies of federated graph foundational models (FedGFMs) break the idealized and untenable assumption of having centralized data storage to train graph foundation models.<n>Existing studies that project aligned generalizable knowledge onto a discrete token space via vector-quantized backbones suffer from irreversible knowledge loss during the quantization process.
arXiv Detail & Related papers (2026-01-29T07:50:00Z) - Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation [56.36237936346563]
Foundation models (FMs) exhibit remarkable generalization but require adaptation to downstream tasks.<n>Due to data privacy regulations, cloud-based FMs cannot directly access private edge data.<n>We introduce Practical Semi-Supervised Federated Learning (PSSFL), where edge devices hold only unlabeled, low-resolution data.<n>Our work paves the way for scalable and privacy-preserving FM adaptation in federated scenarios.
arXiv Detail & Related papers (2025-08-22T17:47:02Z) - FedP3E: Privacy-Preserving Prototype Exchange for Non-IID IoT Malware Detection in Cross-Silo Federated Learning [5.7494612007431805]
We propose FedP3E, a novel FL framework that supports indirect cross-client representation sharing while maintaining data privacy.<n>We evaluate FedP3E on the N-BaIoT dataset under realistic cross-silo scenarios with varying degrees of data imbalance.
arXiv Detail & Related papers (2025-07-09T20:07:35Z) - Hybrid-Regularized Magnitude Pruning for Robust Federated Learning under Covariate Shift [2.298932494750101]
We show that inconsistencies in client-side training distributions substantially degrade the performance of federated learning models.<n>We propose a novel FL framework using a combination of pruning and regularisation of clients' training to improve the sparsity, redundancy, and robustness of neural connections.
arXiv Detail & Related papers (2024-12-19T16:22:37Z) - An Architecture Built for Federated Learning: Addressing Data Heterogeneity through Adaptive Normalization-Free Feature Recalibration [0.3481075494213406]
We propose Adaptive Normalization-free Feature Recalibration (ANFR) to combat heterogeneous data in Federated Learning (FL)<n>ANFR combines weight standardization and channel attention to produce learnable scaling factors for feature maps.<n>Experiments show ANFR consistently outperforms established baselines across various aggregation methods.
arXiv Detail & Related papers (2024-10-02T20:16:56Z) - Personalized federated learning based on feature fusion [2.943623084019036]
Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy.
We propose a personalized federated learning approach called pFedPM.
In our process, we replace traditional gradient uploading with feature uploading, which helps reduce communication costs and allows for heterogeneous client models.
arXiv Detail & Related papers (2024-06-24T12:16:51Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations [53.268801169075836]
We propose FedLAP-DP, a novel privacy-preserving approach for federated learning.
A formal privacy analysis demonstrates that FedLAP-DP incurs the same privacy costs as typical gradient-sharing schemes.
Our approach presents a faster convergence speed compared to typical gradient-sharing methods.
arXiv Detail & Related papers (2023-02-02T12:56:46Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.