Poisoning Retrieval Corpora by Injecting Adversarial Passages
- URL: http://arxiv.org/abs/2310.19156v1
- Date: Sun, 29 Oct 2023 21:13:31 GMT
- Title: Poisoning Retrieval Corpora by Injecting Adversarial Passages
- Authors: Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen
- Abstract summary: We propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages.
When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems.
We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised.
- Score: 79.14287273842878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense retrievers have achieved state-of-the-art performance in various
information retrieval tasks, but to what extent can they be safely deployed in
real-world applications? In this work, we propose a novel attack for dense
retrieval systems in which a malicious user generates a small number of
adversarial passages by perturbing discrete tokens to maximize similarity with
a provided set of training queries. When these adversarial passages are
inserted into a large retrieval corpus, we show that this attack is highly
effective in fooling these systems to retrieve them for queries that were not
seen by the attacker. More surprisingly, these adversarial passages can
directly generalize to out-of-domain queries and corpora with a high success
attack rate -- for instance, we find that 50 generated passages optimized on
Natural Questions can mislead >94% of questions posed in financial documents or
online forums. We also benchmark and compare a range of state-of-the-art dense
retrievers, both unsupervised and supervised. Although different systems
exhibit varying levels of vulnerability, we show they can all be successfully
attacked by injecting up to 500 passages, a small fraction compared to a
retrieval corpus of millions of passages.
Related papers
- Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models [0.0]
Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval.
This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation.
We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
arXiv Detail & Related papers (2024-10-18T14:02:34Z) - On the Vulnerability of Applying Retrieval-Augmented Generation within
Knowledge-Intensive Application Domains [34.122040172188406]
Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains.
We show that RAG is vulnerable to universal poisoning attacks in medical Q&A.
We develop a new detection-based defense to ensure the safe use of RAG.
arXiv Detail & Related papers (2024-09-12T02:43:40Z) - Backdoor Attacks on Dense Passage Retrievers for Disseminating
Misinformation [40.131588857153275]
We introduce a novel scenario where the attackers aim to covertly disseminate targeted misinformation through a retrieval system.
To achieve this, we propose a backdoor attack triggered by grammar errors in dense passage retrieval.
Our approach ensures that attacked models can function normally for standard queries but are manipulated to return passages specified by the attacker.
arXiv Detail & Related papers (2024-02-21T05:03:07Z) - Formalizing and Benchmarking Prompt Injection Attacks and Defenses [59.57908526441172]
We propose a framework to formalize prompt injection attacks.
Based on our framework, we design a new attack by combining existing ones.
Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses.
arXiv Detail & Related papers (2023-10-19T15:12:09Z) - Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and
Baseline via Detection [12.244543468021938]
This paper introduces two types of detection tasks for adversarial documents.
A benchmark dataset is established to facilitate the investigation of adversarial ranking defense.
A comprehensive investigation of the performance of several detection baselines is conducted.
arXiv Detail & Related papers (2023-07-31T16:31:24Z) - A Simple Yet Efficient Method for Adversarial Word-Substitute Attack [30.445201832698192]
We propose a simple yet efficient method that can reduce the average number of adversarial queries by 3-30 times.
This research highlights that an adversary can fool a deep NLP model with much less cost.
arXiv Detail & Related papers (2022-05-07T14:20:57Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection.
Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Reliable evaluation of adversarial robustness with an ensemble of
diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function.
We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.