Poisoning Retrieval Corpora by Injecting Adversarial Passages
- URL: http://arxiv.org/abs/2310.19156v1
- Date: Sun, 29 Oct 2023 21:13:31 GMT
- Title: Poisoning Retrieval Corpora by Injecting Adversarial Passages
- Authors: Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen
- Abstract summary: We propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages.
When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems.
We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised.
- Score: 79.14287273842878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense retrievers have achieved state-of-the-art performance in various
information retrieval tasks, but to what extent can they be safely deployed in
real-world applications? In this work, we propose a novel attack for dense
retrieval systems in which a malicious user generates a small number of
adversarial passages by perturbing discrete tokens to maximize similarity with
a provided set of training queries. When these adversarial passages are
inserted into a large retrieval corpus, we show that this attack is highly
effective in fooling these systems to retrieve them for queries that were not
seen by the attacker. More surprisingly, these adversarial passages can
directly generalize to out-of-domain queries and corpora with a high success
attack rate -- for instance, we find that 50 generated passages optimized on
Natural Questions can mislead >94% of questions posed in financial documents or
online forums. We also benchmark and compare a range of state-of-the-art dense
retrievers, both unsupervised and supervised. Although different systems
exhibit varying levels of vulnerability, we show they can all be successfully
attacked by injecting up to 500 passages, a small fraction compared to a
retrieval corpus of millions of passages.
Related papers
- Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.
We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.
Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks [72.4498910775871]
Vision-language model (VLM)-based retrievers leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods.
In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers.
arXiv Detail & Related papers (2025-01-28T12:40:37Z) - Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models [0.0]
Retrieval Augmented Generation (RAG) addresses this issue by combining Large Language Models with up-to-date information retrieval.
This paper investigates prompt injection attacks on RAG, focusing on malicious objectives beyond misinformation.
We build upon existing corpus poisoning techniques and propose a novel backdoor attack aimed at the fine-tuning process of the dense retriever component.
arXiv Detail & Related papers (2024-10-18T14:02:34Z) - Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages.
We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z) - Whispers in Grammars: Injecting Covert Backdoors to Compromise Dense Retrieval Systems [40.131588857153275]
This paper investigates a novel attack scenario where the attackers aim to mislead the retrieval system into retrieving the attacker-specified contents.
Those contents, injected into the retrieval corpus by attackers, can include harmful text like hate speech or spam.
Unlike prior methods that rely on model weights and generate conspicuous, unnatural outputs, we propose a covert backdoor attack triggered by grammar errors.
arXiv Detail & Related papers (2024-02-21T05:03:07Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.