Related papers: Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

URL: http://arxiv.org/abs/2506.19889v1
Date: Tue, 24 Jun 2025 07:28:29 GMT
Title: Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models
Authors: Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen,
Abstract summary: The privacy violation attack (PVA) was revealed by Staab et al., introduces serious personal privacy issues.<n>Existing defense methods mainly leverage large language models (LLMs) to anonymize the input query.<n>We propose a novel defense paradigm based on retrieval-confused generation (RCG) of LLMs, which can efficiently and covertly defend the PVA.
Score: 9.551704901720726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have made a profound impact on our society and also raised new security concerns. Particularly, due to the remarkable inference ability of LLMs, the privacy violation attack (PVA), revealed by Staab et al., introduces serious personal privacy issues. Existing defense methods mainly leverage LLMs to anonymize the input query, which requires costly inference time and cannot gain satisfactory defense performance. Moreover, directly rejecting the PVA query seems like an effective defense method, while the defense method is exposed, promoting the evolution of PVA. In this paper, we propose a novel defense paradigm based on retrieval-confused generation (RCG) of LLMs, which can efficiently and covertly defend the PVA. We first design a paraphrasing prompt to induce the LLM to rewrite the "user comments" of the attack query to construct a disturbed database. Then, we propose the most irrelevant retrieval strategy to retrieve the desired user data from the disturbed database. Finally, the "data comments" are replaced with the retrieved user data to form a defended query, leading to responding to the adversary with some wrong personal attributes, i.e., the attack fails. Extensive experiments are conducted on two datasets and eight popular LLMs to comprehensively evaluate the feasibility and the superiority of the proposed defense method.

Related papers

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks [3.206339985805037]
Dashed Line Defense (DLD) is a plug-and-play post-processing method specifically designed to withstand adaptive query strategies.<n>By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries.<n>We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet.
arXiv Detail & Related papers (2026-02-09T14:02:32Z)
SpooFL: Spoofing Federated Learning [54.05993847488204]
We introduce a spoofing-based defense that deceives attackers into believing they have recovered the true training data.<n>Unlike prior synthetic-data defenses that share classes or distributions with the private data, SFL uses a state-of-the-art generative model trained on an external dataset.<n>As a result, attackers are misled into recovering plausible yet completely irrelevant samples, preventing meaningful data leakage.
arXiv Detail & Related papers (2026-01-21T14:57:18Z)
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks [17.77094760401298]
We study the vulnerability of fine-tuned large language models to membership inference attacks (MIAs)<n>We propose SOFT, a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection.
arXiv Detail & Related papers (2025-06-12T07:23:56Z)
Benchmarking Misuse Mitigation Against Covert Adversaries [80.74502950627736]
Existing language model safety evaluations focus on overt attacks and low-stakes tasks.<n>We develop Benchmarks for Stateful Defenses (BSD), a data generation pipeline that automates evaluations of covert attacks and corresponding defenses.<n>Our evaluations indicate that decomposition attacks are effective misuse enablers, and highlight stateful defenses as a countermeasure.
arXiv Detail & Related papers (2025-06-06T17:33:33Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs [54.90315421117162]
We propose a novel poisoning method via completely harmless data.<n>Inspired by the causal reasoning in auto-regressive LLMs, we aim to establish robust associations between triggers and an affirmative response prefix.<n>We observe an interesting resistance phenomenon where the LLM initially appears to agree but subsequently refuses to answer.
arXiv Detail & Related papers (2025-05-23T08:13:59Z)
Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation [0.9217021281095907]
We introduce an efficient and easy-to-use method for conducting a Membership Inference Attack (MIA) against RAG systems.<n>We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models.<n>Our findings highlight the importance of implementing security countermeasures in deployed RAG systems.
arXiv Detail & Related papers (2024-05-30T19:46:36Z)
Information Leakage from Embedding in Large Language Models [5.475800773759642]
This study aims to investigate the potential for privacy invasion through input reconstruction attacks. We first propose two base methods to reconstruct original texts from a model's hidden states. We then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers.
arXiv Detail & Related papers (2024-05-20T09:52:31Z)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs) Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z)
Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection [6.201275002179716]
We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization. To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models.
arXiv Detail & Related papers (2023-09-06T14:54:11Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective. We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z)
Concealing Sensitive Samples against Gradient Leakage in Federated Learning [41.43099791763444]
Federated Learning (FL) is a distributed learning paradigm that enhances users privacy by eliminating the need for clients to share raw, private data with the server. Recent studies expose the vulnerability of FL to model inversion attacks, where adversaries reconstruct users private data via eavesdropping on the shared gradient information. We present a simple, yet effective defense strategy that obfuscates the gradients of the sensitive data with concealed samples.
arXiv Detail & Related papers (2022-09-13T04:19:35Z)
Defense Against Gradient Leakage Attacks via Learning to Obscure Data [48.67836599050032]
Federated learning is considered as an effective privacy-preserving learning mechanism. In this paper, we propose a new defense method to protect the privacy of clients' data by learning to obscure data.
arXiv Detail & Related papers (2022-06-01T21:03:28Z)
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.