A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain
- URL: http://arxiv.org/abs/2511.07577v1
- Date: Wed, 12 Nov 2025 01:04:54 GMT
- Title: A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain
- Authors: Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang,
- Abstract summary: decentralization brings a challenge: the numerous independent data sources vary significantly in reliability.<n>Our system achieves a +10.7% performance improvement over its centralized counterpart in the real world-like unreliable data environments.<n>The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56% marginal cost savings.
- Score: 18.738400901246898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.
Related papers
- A Secure and Private Distributed Bayesian Federated Learning Design [56.92336577799572]
Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server.<n>DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy.<n>We propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration.
arXiv Detail & Related papers (2026-02-23T16:12:02Z) - Scaling Decentralized Learning with FLock [31.883271929012977]
This paper introduces FLock, a decentralized framework for fine-tuning large language models (LLMs)<n> Integrating a blockchain-based trust layer with economic incentives, FLock replaces the central aggregator with a secure, auditable protocol for cooperation among untrusted parties.<n>Our experiments show the FLock framework defends against backdoor poisoning attacks that compromise standard FLs.
arXiv Detail & Related papers (2025-07-21T08:01:43Z) - Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z) - The Built-In Robustness of Decentralized Federated Averaging to Bad Data [2.7961972519572447]
Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller.<n>In this setting, local data remains private, but its quality and quantity can vary significantly across nodes.<n>We simulate two scenarios with degraded data quality, one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node.
arXiv Detail & Related papers (2025-02-25T11:06:51Z) - Protocol Learning, Decentralized Frontier Risk and the No-Off Problem [56.74434512241989]
We identify a third paradigm - Protocol Learning - where models are trained across decentralized networks of incentivized participants.<n>This approach has the potential to aggregate orders of magnitude more computational resources than any single centralized entity.<n>It also introduces novel challenges: heterogeneous and unreliable nodes, malicious participants, the need for unextractable models to preserve incentives, and complex governance dynamics.
arXiv Detail & Related papers (2024-12-10T19:53:50Z) - Retrieval-Augmented Generation with Estimation of Source Reliability [28.70905685371307]
Reliability-Aware RAG (RA-RAG) is a new multi-source RAG framework that estimates the reliability of sources.<n> RA-RAG first estimates source reliability by cross-checking information across multiple sources.<n>It then retrieves documents from the top-$kappa$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV)
arXiv Detail & Related papers (2024-10-30T12:09:29Z) - Towards Secure and Private AI: A Framework for Decentralized Inference [14.526663289437584]
Large multimodal foundational models present challenges in scalability, reliability, and potential misuse.<n>Decentralized systems offer a solution by distributing workload and mitigating central points of failure.<n>We address these challenges with a comprehensive framework designed for responsible AI development.
arXiv Detail & Related papers (2024-07-28T05:09:17Z) - Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks.
By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z) - Networked Communication for Decentralised Agents in Mean-Field Games [59.01527054553122]
We introduce networked communication to the mean-field game framework.<n>We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases.<n>We show that our networked approach has significant advantages over both alternatives in terms of robustness to update failures and to changes in population size.
arXiv Detail & Related papers (2023-06-05T10:45:39Z) - DeFTA: A Plug-and-Play Decentralized Replacement for FedAvg [28.255536979484518]
We propose Decentralized Federated Trusted Averaging (DeFTA) as a plug-and-play replacement for FedAvg.
DeFTA brings better security, scalability, and fault-tolerance to the federated learning process after installation.
arXiv Detail & Related papers (2022-04-06T07:20:31Z) - Decentralised Learning from Independent Multi-Domain Labels for Person
Re-Identification [69.29602103582782]
Deep learning has been successful for many computer vision tasks due to the availability of shared and centralised large-scale training data.
However, increasing awareness of privacy concerns poses new challenges to deep learning, especially for person re-identification (Re-ID)
We propose a novel paradigm called Federated Person Re-Identification (FedReID) to construct a generalisable global model (a central server) by simultaneously learning with multiple privacy-preserved local models (local clients)
This client-server collaborative learning process is iteratively performed under privacy control, enabling FedReID to realise decentralised learning without sharing distributed data nor collecting any
arXiv Detail & Related papers (2020-06-07T13:32:33Z) - Byzantine-resilient Decentralized Stochastic Gradient Descent [85.15773446094576]
We present an in-depth study towards the Byzantine resilience of decentralized learning systems.
We propose UBAR, a novel algorithm to enhance decentralized learning with Byzantine Fault Tolerance.
arXiv Detail & Related papers (2020-02-20T05:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.