Related papers: Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAG

Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAG

URL: http://arxiv.org/abs/2507.17199v1
Date: Wed, 23 Jul 2025 04:45:01 GMT
Title: Threshold-Protected Searchable Sharing: Privacy Preserving Aggregated-ANN Search for Collaborative RAG
Authors: Ruoyang Rykie Guo,
Abstract summary: Two key bottlenecks are private data repositories' locality constraints and the need to maintain compatibility with mainstream search techniques.<n>We develop a secure and privacy-preserving aggregated approximate nearest neighbor search (SP-A$2$NN) with HNSW compatibility.<n>We also explore a novel security analytical framework that incorporates privacy analysis via reductions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-powered search services have driven data integration as a significant trend. However, this trend's progress is fundamentally hindered, despite the fact that combining individual knowledge can significantly improve the relevance and quality of responses in specialized queries and make AI more professional at providing services. Two key bottlenecks are private data repositories' locality constraints and the need to maintain compatibility with mainstream search techniques, particularly Hierarchical Navigable Small World (HNSW) indexing for high-dimensional vector spaces. In this work, we develop a secure and privacy-preserving aggregated approximate nearest neighbor search (SP-A$^2$NN) with HNSW compatibility under a threshold-based searchable sharing primitive. A sharable bitgraph structure is constructed and extended to support searches and dynamical insertions over shared data without compromising the underlying graph topology. The approach reduces the complexity of a search from $O(n^2)$ to $O(n)$ compared to naive (undirected) graph-sharing approach when organizing graphs in the identical HNSW manner. On the theoretical front, we explore a novel security analytical framework that incorporates privacy analysis via reductions. The proposed leakage-guessing proof system is built upon an entirely different interactive game that is independent of existing coin-toss game design. Rather than being purely theoretical, this system is rooted in existing proof systems but goes beyond them to specifically address leakage concerns and standardize leakage analysis -- one of the most critical security challenges with AI's rapid development.

Related papers

DVFS: A Dynamic Verifiable Fuzzy Search Service for Encrypted Cloud Data [5.420017752806512]
Cloud storage introduces critical privacy challenges for encrypted data retrieval.<n>Current solutions face fundamental trade-offs between security and efficiency.<n>We propose DVFS - a dynamic verifiable fuzzy search service with three core innovations.
arXiv Detail & Related papers (2025-07-15T02:36:30Z)
Convergent Privacy Framework with Contractive GNN Layers for Multi-hop Aggregations [9.399260063250635]
Differential privacy (DP) has been integrated into graph neural networks (GNNs) to protect sensitive structural information.<n>We propose a simple yet effective Contractive Graph Layer (CGL) that ensures the contractiveness required for theoretical guarantees.<n>Our framework, CARIBOU, supports both training and inference, equipped with a contractive aggregation module, a privacy allocation module, and a privacy auditing module.
arXiv Detail & Related papers (2025-06-28T02:17:53Z)
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories [3.323388021979584]
Malicious URLs persistently threaten the cybersecurity ecosystem, by either deceiving users into divulging private data or distributing harmful payloads to infiltrate host systems.<n>This review systematically analyzes methods from traditional blacklisting to advanced deep learning approaches.<n>Unlike prior surveys, we propose a novel modality-based taxonomy that categorizes existing works according to their primary data modalities.
arXiv Detail & Related papers (2025-04-23T06:23:18Z)
Constrained Auto-Regressive Decoding Constrains Generative Retrieval [71.71161220261655]
Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network.<n>In this paper, we examine the inherent limitations of constrained auto-regressive generation from two essential perspectives: constraints and beam search.
arXiv Detail & Related papers (2025-04-14T06:54:49Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.<n>The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.<n>We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z)
Illicit item detection in X-ray images for security applications [7.519872646378835]
Automated detection of contraband items in X-ray images can significantly increase public safety. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task. This paper proposes a two-fold improvement of such algorithms for the X-ray analysis domain.
arXiv Detail & Related papers (2023-05-03T07:28:05Z)
On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis. We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy. The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices. An increasing number of such databases consist of data from multiple sources, not all of which can be trusted. This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.