Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset
- URL: http://arxiv.org/abs/2511.00930v1
- Date: Sun, 02 Nov 2025 13:12:19 GMT
- Title: Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset
- Authors: Xijie Ba, Qin Liu, Xiaohong Li, Jianting Ning,
- Abstract summary: Substring-searchable symmetric encryption (substring-SSE) has become increasingly critical for privacy-preserving applications in cloud systems.<n>Leakage-abuse attacks have been widely studied for traditional SSE, but their applicability to SSE under partially known data assumptions remains unexplored.<n>We present the first leakage-abuse attack on SSE under partially-known dataset conditions.
- Score: 14.135889019986124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Substring-searchable symmetric encryption (substring-SSE) has become increasingly critical for privacy-preserving applications in cloud systems. However, existing schemes remain vulnerable to information leakage during search operations, particularly when adversaries possess partial knowledge of the target dataset. Although leakage-abuse attacks have been widely studied for traditional SSE, their applicability to substring-SSE under partially known data assumptions remains unexplored. In this paper, we present the first leakage-abuse attack on substring-SSE under partially-known dataset conditions. We develop a novel matrix-based correlation technique that extends and optimizes the LEAP framework for substring-SSE, enabling efficient recovery of plaintext data from encrypted suffix tree structures. Unlike existing approaches that rely on independent auxiliary datasets, our method directly exploits known data fragments to establish high-confidence mappings between ciphertext tokens and plaintext substrings through iterative matrix transformations. Comprehensive experiments on real-world datasets demonstrate the effectiveness of the attack, with recovery rates reaching 98.32% for substrings given 50% auxiliary knowledge. Even with only 10% prior knowledge, the attack achieves 74.42% substring recovery while maintaining strong scalability across datasets of varying sizes. The result reveals significant privacy risks in current substring-SSE designs and highlights the urgent need for leakage-resilient constructions.
Related papers
- When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift [0.0]
We present a comprehensive analysis using a benchmark of 18 datasets spanning harmful requests, jailbreaks, indirect prompt injections, and extraction attacks.<n>We propose Leave-One-Dataset-Out (LODO) evaluation to measure true out-of-distribution generalization.
arXiv Detail & Related papers (2026-02-15T14:21:43Z) - Efficient Privacy-Preserving Retrieval Augmented Generation with Distance-Preserving Encryption [25.87368479678027]
RAG has emerged as a key technique for enhancing response quality of LLMs without high computational cost.<n>In traditional architectures, RAG services are provided by a single entity that hosts the dataset within a trusted local environment.<n>This dependence on untrusted third-party services introduces privacy risks.<n>We propose an efficient privacy-preserving RAG framework (ppRAG) tailored for untrusted cloud environments.
arXiv Detail & Related papers (2026-01-18T09:29:50Z) - Compressive Meta-Learning [49.300635370079874]
Compressive learning is a framework that enables efficient processing by using random, non-linear features.<n>We propose a framework that meta-learns both the encoding and decoding stages of compressive learning methods.<n>We explore multiple applications -- including neural network-based compressive PCA, compressive ridge regression, compressive k-means, and autoencoders.
arXiv Detail & Related papers (2025-08-14T22:08:06Z) - Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation [15.985529058573912]
Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge bases.<n>Existing privacy attacks on RAG systems can trigger data leakage but often fail to accurately isolate knowledge-base-derived sentences within mixed responses.<n>This paper presents a novel black-box attack framework that exploits knowledge asymmetry between RAG and standard LLMs to achieve fine-grained privacy extraction.
arXiv Detail & Related papers (2025-07-31T03:50:16Z) - DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [70.77570343385928]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z) - S-Leak: Leakage-Abuse Attack Against Efficient Conjunctive SSE via s-term Leakage [13.222101654411281]
Conjunctive Searchable Encryption (CSSE) enables secure conjunctive searches over encrypted data.<n>In this paper, we reveal a fundamental vulnerability in state-of-the-art CSSE schemes: s-term leakage.<n>We propose S-Leak, the first passive attack framework that progressively recovers conjunctive queries by exploiting s-term leakage and global leakage.
arXiv Detail & Related papers (2025-07-05T15:53:31Z) - Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation [0.0]
Searchable Symmetric Encryption (SSE) enables efficient search capabilities over encrypted data, allowing users to maintain privacy while utilizing cloud storage.<n>SSE schemes are vulnerable to leakage attacks that exploit access patterns, search frequency, and volume information.<n>We propose a novel approach that leverages large language models (LLMs), specifically GPT-4 variants, to generate synthetic documents that statistically and semantically resemble the real-world dataset of Enron emails.
arXiv Detail & Related papers (2025-04-29T04:23:10Z) - A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both.<n>We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments.<n>The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z) - Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion [21.83264152003852]
We introduce Eguard, a novel defense mechanism designed to mitigate embedding inversion attacks.
Our approach significantly reduces privacy risks, protecting over 95% of tokens from inversion while maintaining high performance.
arXiv Detail & Related papers (2024-11-06T14:42:41Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - LORD: Leveraging Open-Set Recognition with Unknown Data [10.200937444995944]
LORD is a framework to Leverage Open-set Recognition by exploiting unknown data.
We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers.
arXiv Detail & Related papers (2023-08-24T06:12:41Z) - Semantic Perturbations with Normalizing Flows for Improved
Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations.
We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.