Related papers: A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

URL: http://arxiv.org/abs/2511.07577v1
Date: Wed, 12 Nov 2025 01:04:54 GMT
Title: A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain
Authors: Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang,
Abstract summary: decentralization brings a challenge: the numerous independent data sources vary significantly in reliability.<n>Our system achieves a +10.7% performance improvement over its centralized counterpart in the real world-like unreliable data environments.<n>The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56% marginal cost savings.
Score: 18.738400901246898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.

Related papers

A Secure and Private Distributed Bayesian Federated Learning Design [56.92336577799572]
Distributed Federated Learning (DFL) enables decentralized model training across large-scale systems without a central parameter server.<n>DFL faces three critical challenges: privacy leakage from honest-but-curious neighbors, slow convergence due to the lack of central coordination, and vulnerability to Byzantine adversaries aiming to degrade model accuracy.<n>We propose a novel DFL framework that integrates Byzantine robustness, privacy preservation, and convergence acceleration.
arXiv Detail & Related papers (2026-02-23T16:12:02Z)
Scaling Decentralized Learning with FLock [31.883271929012977]
This paper introduces FLock, a decentralized framework for fine-tuning large language models (LLMs)<n> Integrating a blockchain-based trust layer with economic incentives, FLock replaces the central aggregator with a secure, auditable protocol for cooperation among untrusted parties.<n>Our experiments show the FLock framework defends against backdoor poisoning attacks that compromise standard FLs.
arXiv Detail & Related papers (2025-07-21T08:01:43Z)
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG)<n>FedE4RAG facilitates collaborative training of client-side RAG retrieval models.<n>We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
The Built-In Robustness of Decentralized Federated Averaging to Bad Data [2.7961972519572447]
Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller.<n>In this setting, local data remains private, but its quality and quantity can vary significantly across nodes.<n>We simulate two scenarios with degraded data quality, one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node.
arXiv Detail & Related papers (2025-02-25T11:06:51Z)
Protocol Learning, Decentralized Frontier Risk and the No-Off Problem [56.74434512241989]
We identify a third paradigm - Protocol Learning - where models are trained across decentralized networks of incentivized participants.<n>This approach has the potential to aggregate orders of magnitude more computational resources than any single centralized entity.<n>It also introduces novel challenges: heterogeneous and unreliable nodes, malicious participants, the need for unextractable models to preserve incentives, and complex governance dynamics.
arXiv Detail & Related papers (2024-12-10T19:53:50Z)
Retrieval-Augmented Generation with Estimation of Source Reliability [28.70905685371307]
Reliability-Aware RAG (RA-RAG) is a new multi-source RAG framework that estimates the reliability of sources.<n> RA-RAG first estimates source reliability by cross-checking information across multiple sources.<n>It then retrieves documents from the top-$kappa$ reliable and relevant sources and aggregates their information using weighted majority voting (WMV)
arXiv Detail & Related papers (2024-10-30T12:09:29Z)
Towards Secure and Private AI: A Framework for Decentralized Inference [14.526663289437584]
Large multimodal foundational models present challenges in scalability, reliability, and potential misuse.<n>Decentralized systems offer a solution by distributing workload and mitigating central points of failure.<n>We address these challenges with a comprehensive framework designed for responsible AI development.
arXiv Detail & Related papers (2024-07-28T05:09:17Z)
Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks. By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z)
Networked Communication for Decentralised Agents in Mean-Field Games [59.01527054553122]
We introduce networked communication to the mean-field game framework.<n>We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases.<n>We show that our networked approach has significant advantages over both alternatives in terms of robustness to update failures and to changes in population size.
arXiv Detail & Related papers (2023-06-05T10:45:39Z)
DeFTA: A Plug-and-Play Decentralized Replacement for FedAvg [28.255536979484518]
We propose Decentralized Federated Trusted Averaging (DeFTA) as a plug-and-play replacement for FedAvg. DeFTA brings better security, scalability, and fault-tolerance to the federated learning process after installation.
arXiv Detail & Related papers (2022-04-06T07:20:31Z)
Decentralised Learning from Independent Multi-Domain Labels for Person Re-Identification [69.29602103582782]
Deep learning has been successful for many computer vision tasks due to the availability of shared and centralised large-scale training data. However, increasing awareness of privacy concerns poses new challenges to deep learning, especially for person re-identification (Re-ID) We propose a novel paradigm called Federated Person Re-Identification (FedReID) to construct a generalisable global model (a central server) by simultaneously learning with multiple privacy-preserved local models (local clients) This client-server collaborative learning process is iteratively performed under privacy control, enabling FedReID to realise decentralised learning without sharing distributed data nor collecting any
arXiv Detail & Related papers (2020-06-07T13:32:33Z)
Byzantine-resilient Decentralized Stochastic Gradient Descent [85.15773446094576]
We present an in-depth study towards the Byzantine resilience of decentralized learning systems. We propose UBAR, a novel algorithm to enhance decentralized learning with Byzantine Fault Tolerance.
arXiv Detail & Related papers (2020-02-20T05:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.