Related papers: Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges

Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges

URL: http://arxiv.org/abs/2501.18536v1
Date: Thu, 30 Jan 2025 18:02:15 GMT
Title: Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges
Authors: Manveer Singh Tamber, Jimmy Lin,
Abstract summary: We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
Score: 52.96987928118327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Consider a scenario in which a user searches for information, only to encounter texts flooded with misleading or non-relevant content. This scenario exemplifies a simple yet potent vulnerability in neural Information Retrieval (IR) pipelines: content injection attacks. We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to these attacks, in which adversaries insert misleading text into passages to manipulate model judgements. We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance. While the second tactic has been explored in prior research, we present, to our knowledge, the first empirical analysis of the first threat, demonstrating how state-of-the-art models can be easily misled. Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material. Additionally, we explore various defense strategies, including adversarial passage classifiers, retriever fine-tuning to discount manipulated content, and prompting LLM judges to adopt a more cautious approach. However, we find that these countermeasures often involve trade-offs, sacrificing effectiveness for attack robustness and sometimes penalizing legitimate documents in the process. Our findings highlight the need for stronger defenses against these evolving adversarial strategies to maintain the trustworthiness of IR systems. We release our code and scripts to facilitate further research.

Related papers

Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers [61.57691030102618]
We propose a novel jailbreaking method, Paper Summary Attack (llmnamePSA)<n>It synthesizes content from either attack-focused or defense-focused LLM safety paper to construct an adversarial prompt template.<n>Experiments show significant vulnerabilities not only in base LLMs, but also in state-of-the-art reasoning model like Deepseek-R1.
arXiv Detail & Related papers (2025-07-17T18:33:50Z)
DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
Stealthy LLM-Driven Data Poisoning Attacks Against Embedding-Based Retrieval-Augmented Recommender Systems [16.79952669254101]
We study provider-side data poisoning in retrieval-augmented recommender systems (RAG)<n>By modifying only a small fraction of tokens within item descriptions, an attacker can significantly promote or demote targeted items.<n>Experiments on MovieLens, using two large language model (LLM) retrieval modules, show that even subtle attacks shift final rankings and item exposures while eluding naive detection.
arXiv Detail & Related papers (2025-05-08T12:53:42Z)
Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks [72.4498910775871]
Vision-language model (VLM)-based retrievers leverage document screenshots embedded as vectors to enable effective search and offer a simplified pipeline over traditional text-only methods.<n>In this study, we propose three pixel poisoning attack methods designed to compromise VLM-based retrievers.
arXiv Detail & Related papers (2025-01-28T12:40:37Z)
Unsupervised dense retrieval with conterfactual contrastive learning [16.679649921935482]
We propose to improve the robustness of dense retrieval models by enhancing their sensitivity of fine-graned relevance signals.<n>A model achieving sensitivity in this context should exhibit high variances when documents' key passages determining their relevance to queries have been modified.<n>Motivated by causality and counterfactual analysis, we propose a series of counterfactual regularization methods.
arXiv Detail & Related papers (2024-12-30T07:01:34Z)
On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains [34.122040172188406]
Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains. We show that RAG is vulnerable to universal poisoning attacks in medical Q&A. We develop a new detection-based defense to ensure the safe use of RAG.
arXiv Detail & Related papers (2024-09-12T02:43:40Z)
Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models [13.225041704917905]
This study unveils an attack mechanism that capitalizes on human conversation strategies to extract harmful information from large language models. Unlike conventional methods that target explicit malicious responses, our approach delves deeper into the nature of the information provided in responses.
arXiv Detail & Related papers (2024-07-22T06:04:29Z)
Whispers in Grammars: Injecting Covert Backdoors to Compromise Dense Retrieval Systems [40.131588857153275]
This paper investigates a novel attack scenario where the attackers aim to mislead the retrieval system into retrieving the attacker-specified contents.<n>Those contents, injected into the retrieval corpus by attackers, can include harmful text like hate speech or spam.<n>Unlike prior methods that rely on model weights and generate conspicuous, unnatural outputs, we propose a covert backdoor attack triggered by grammar errors.
arXiv Detail & Related papers (2024-02-21T05:03:07Z)
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems [28.175920880194223]
We investigate factors related to embedding models that may impact text recoverability via Vec2Text. We propose a simple embedding transformation fix that guarantees equal ranking effectiveness while mitigating the recoverability risk.
arXiv Detail & Related papers (2024-02-20T07:49:30Z)
Poisoning Retrieval Corpora by Injecting Adversarial Passages [79.14287273842878]
We propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised.
arXiv Detail & Related papers (2023-10-29T21:13:31Z)
Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection [12.244543468021938]
This paper introduces two types of detection tasks for adversarial documents. A benchmark dataset is established to facilitate the investigation of adversarial ranking defense. A comprehensive investigation of the performance of several detection baselines is conducted.
arXiv Detail & Related papers (2023-07-31T16:31:24Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.