Provably Secure Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2508.01084v1
- Date: Fri, 01 Aug 2025 21:37:16 GMT
- Title: Provably Secure Retrieval-Augmented Generation
- Authors: Pengcheng Zhou, Yinglun Feng, Zhongliang Yang,
- Abstract summary: This paper proposes the first provably secure framework for Retrieval-Augmented Generation (RAG) systems.<n>Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings.
- Score: 7.412110686946628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although Retrieval-Augmented Generation (RAG) systems have been widely applied, the privacy and security risks they face, such as data leakage and data poisoning, have not been systematically addressed yet. Existing defense strategies primarily rely on heuristic filtering or enhancing retriever robustness, which suffer from limited interpretability, lack of formal security guarantees, and vulnerability to adaptive attacks. To address these challenges, this paper proposes the first provably secure framework for RAG systems(SAG). Our framework employs a pre-storage full-encryption scheme to ensure dual protection of both retrieved content and vector embeddings, guaranteeing that only authorized entities can access the data. Through formal security proofs, we rigorously verify the scheme's confidentiality and integrity under a computational security model. Extensive experiments across multiple benchmark datasets demonstrate that our framework effectively resists a range of state-of-the-art attacks. This work establishes a theoretical foundation and practical paradigm for verifiably secure RAG systems, advancing AI-powered services toward formally guaranteed security.
Related papers
- Beyond Algorithmic Proofs: Towards Implementation-Level Provable Security [1.338174941551702]
We present Implementation-Level Provable Security, a new paradigm that defines security in terms of structurally verifiable resilience against real-world attack surfaces during deployment.<n>We present SEER (Secure and Efficient Encryption-based Erasure via Ransomware), a file destruction system that repurposes and reinforces the encryption core of Babuk ransomware.
arXiv Detail & Related papers (2025-08-02T01:58:06Z) - Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z) - Towards provable probabilistic safety for scalable embodied AI systems [79.31011047593492]
Embodied AI systems are increasingly prevalent across various applications.<n> Ensuring their safety in complex operating environments remains a major challenge.<n>This Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.
arXiv Detail & Related papers (2025-06-05T15:46:25Z) - LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z) - Securing RAG: A Risk Assessment and Mitigation Framework [0.0]
Retrieval Augmented Generation (RAG) has emerged as the de facto industry standard for user-facing NLP applications.<n>This paper reviews the vulnerabilities of RAG pipelines, and outlines the attack surface from data pre-processing to integration with Large Language Models (LLMs)
arXiv Detail & Related papers (2025-05-13T16:39:00Z) - MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG [65.0423152595537]
We propose MES-RAG, which enhances entity-specific query handling and provides accurate, secure, and consistent responses.<n>MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access.<n> Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering.
arXiv Detail & Related papers (2025-03-17T08:09:42Z) - Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval [7.412110686946628]
This paper proposes an advanced encryption methodology designed to protect RAG systems from unauthorized access and data leakage.<n>Our approach encrypts both textual content and its corresponding embeddings prior to storage, ensuring that all data remains securely encrypted.<n>Our findings suggest that integrating advanced encryption techniques into the design and deployment of RAG systems can effectively enhance privacy safeguards.
arXiv Detail & Related papers (2025-03-17T07:45:05Z) - ACRIC: Securing Legacy Communication Networks via Authenticated Cyclic Redundancy Integrity Check [98.34702864029796]
Recent security incidents in safety-critical industries exposed how the lack of proper message authentication enables attackers to inject malicious commands or alter system behavior.<n>These shortcomings have prompted new regulations that emphasize the pressing need to strengthen cybersecurity.<n>We introduce ACRIC, a message authentication solution to secure legacy industrial communications.
arXiv Detail & Related papers (2024-11-21T18:26:05Z) - FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation [1.3824176915623292]
This paper introduces textitFederated Retrieval-Augmented Generation (FRAG), a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems.
FRAG enables mutually-distrusted parties to collaboratively perform Approximate $k$-Nearest Neighbor (ANN) searches on encrypted query vectors and encrypted data stored in distributed vector databases.
arXiv Detail & Related papers (2024-10-17T06:57:29Z) - Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation [0.9217021281095907]
We introduce an efficient and easy-to-use method for conducting a Membership Inference Attack (MIA) against RAG systems.<n>We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models.<n>Our findings highlight the importance of implementing security countermeasures in deployed RAG systems.
arXiv Detail & Related papers (2024-05-30T19:46:36Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.