HyFedRAG: A Federated Retrieval-Augmented Generation Framework for Heterogeneous and Privacy-Sensitive Data
- URL: http://arxiv.org/abs/2509.06444v1
- Date: Mon, 08 Sep 2025 08:44:24 GMT
- Title: HyFedRAG: A Federated Retrieval-Augmented Generation Framework for Heterogeneous and Privacy-Sensitive Data
- Authors: Cheng Qian, Hainan Zhang, Yongxin Tong, Hong-Wei Zheng, Zhiming Zheng,
- Abstract summary: HyFedRAG is a unified and efficient Federated RAG framework tailored for Hybrid data modalities.<n>By leveraging an edge-cloud collaborative mechanism, HyFedRAG enables RAG to operate across diverse data sources while preserving data privacy.
- Score: 16.102325291252324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Centralized RAG pipelines struggle with heterogeneous and privacy-sensitive data, especially in distributed healthcare settings where patient data spans SQL, knowledge graphs, and clinical notes. Clinicians face difficulties retrieving rare disease cases due to privacy constraints and the limitations of traditional cloud-based RAG systems in handling diverse formats and edge devices. To address this, we introduce HyFedRAG, a unified and efficient Federated RAG framework tailored for Hybrid data modalities. By leveraging an edge-cloud collaborative mechanism, HyFedRAG enables RAG to operate across diverse data sources while preserving data privacy. Our key contributions are: (1) We design an edge-cloud collaborative RAG framework built on Flower, which supports querying structured SQL data, semi-structured knowledge graphs, and unstructured documents. The edge-side LLMs convert diverse data into standardized privacy-preserving representations, and the server-side LLMs integrates them for global reasoning and generation. (2) We integrate lightweight local retrievers with privacy-aware LLMs and provide three anonymization tools that enable each client to produce semantically rich, de-identified summaries for global inference across devices. (3) To optimize response latency and reduce redundant computation, we design a three-tier caching strategy consisting of local cache, intermediate representation cache, and cloud inference cache. Experimental results on PMC-Patients demonstrate that HyFedRAG outperforms existing baselines in terms of retrieval quality, generation consistency, and system efficiency. Our framework offers a scalable and privacy-compliant solution for RAG over structural-heterogeneous data, unlocking the potential of LLMs in sensitive and diverse data environments.
Related papers
- Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models [63.70401095689976]
We argue that replacing parameters with preferences represents a more scalable and privacy-preserving future.<n>We propose MoR, a federated alignment framework based on GRPO with Mixture-of-Rewards for heterogeneous VLMs.<n>MoR consistently outperforms federated alignment baselines in generalization, robustness, and cross-client adaptability.
arXiv Detail & Related papers (2026-01-31T03:11:51Z) - Efficient Privacy-Preserving Retrieval Augmented Generation with Distance-Preserving Encryption [25.87368479678027]
RAG has emerged as a key technique for enhancing response quality of LLMs without high computational cost.<n>In traditional architectures, RAG services are provided by a single entity that hosts the dataset within a trusted local environment.<n>This dependence on untrusted third-party services introduces privacy risks.<n>We propose an efficient privacy-preserving RAG framework (ppRAG) tailored for untrusted cloud environments.
arXiv Detail & Related papers (2026-01-18T09:29:50Z) - Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework [57.04850867402913]
Federated clustering addresses the challenge of extracting patterns from decentralized, unlabeled data.<n>We propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing.<n>Our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10% (NMI) over federated baselines while maintaining provable privacy guarantees.
arXiv Detail & Related papers (2025-11-14T03:05:22Z) - Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG [65.0423152595537]
We propose MES-RAG, which enhances entity-specific query handling and provides accurate, secure, and consistent responses.<n>MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access.<n> Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering.
arXiv Detail & Related papers (2025-03-17T08:09:42Z) - C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System [7.385458207094507]
We introduce Confidential Computing (CC) techniques as a solution for secure Federated Retrieval Augmented Generation (FedRAG)<n>Our proposed Confidential FedRAG system (C-FedRAG) enables secure connection and scaling of a RAG across a decentralized network of data providers by ensuring context confidentiality.
arXiv Detail & Related papers (2024-12-17T18:42:21Z) - FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation [1.3824176915623292]
This paper introduces textitFederated Retrieval-Augmented Generation (FRAG), a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems.
FRAG enables mutually-distrusted parties to collaboratively perform Approximate $k$-Nearest Neighbor (ANN) searches on encrypted query vectors and encrypted data stored in distributed vector databases.
arXiv Detail & Related papers (2024-10-17T06:57:29Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - Free Lunch for Federated Remote Sensing Target Fine-Grained
Classification: A Parameter-Efficient Framework [23.933367972846312]
This paper proposes a novel Privacy-Reserving TFGC Framework based on Federated Learning, dubbed PRFL.
We demonstrate the effectiveness of the proposed PRFL on the classical TFGC task by leveraging four public datasets.
arXiv Detail & Related papers (2024-01-03T01:45:00Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.