Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
- URL: http://arxiv.org/abs/2501.04802v2
- Date: Sat, 08 Mar 2025 22:14:57 GMT
- Title: Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
- Authors: Yongkang Li, Panagiotis Eustratiadis, Evangelos Kanoulas,
- Abstract summary: HotFlip is a topical gradient-based word substitution method for attacking language models.<n>In this paper, we significantly boost the efficiency of HotFlip, reducing the adversarial generation process from 4 hours per document to only 15 minutes.<n>We also contribute experiments and analysis on two additional tasks: (1) transfer-based black-box attacks, and (2) query-agnostic attacks.
- Score: 14.799512604321363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: HotFlip is a topical gradient-based word substitution method for attacking language models. Recently, this method has been further applied to attack retrieval systems by generating malicious passages that are injected into a corpus, i.e., corpus poisoning. However, HotFlip is known to be computationally inefficient, with the majority of time being spent on gradient accumulation for each query-passage pair during the adversarial token generation phase, making it impossible to generate an adequate number of adversarial passages in a reasonable amount of time. Moreover, the attack method itself assumes access to a set of user queries, a strong assumption that does not correspond to how real-world adversarial attacks are usually performed. In this paper, we first significantly boost the efficiency of HotFlip, reducing the adversarial generation process from 4 hours per document to only 15 minutes, using the same hardware. We further contribute experiments and analysis on two additional tasks: (1) transfer-based black-box attacks, and (2) query-agnostic attacks. Whenever possible, we provide comparisons between the original method and our improved version. Our experiments demonstrate that HotFlip can effectively attack a variety of dense retrievers, with an observed trend that its attack performance diminishes against more advanced and recent methods. Interestingly, we observe that while HotFlip performs poorly in a black-box setting, indicating limited capacity for generalization, in query-agnostic scenarios its performance is correlated to the volume of injected adversarial passages.
Related papers
- Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval [15.141046442409953]
We present an adversarial corpus attack that is fast and effective.
We consider attacks under both a white-box and a black-box setting.
Our method can generate successful adversarial examples in under two minutes per target document.
arXiv Detail & Related papers (2025-04-24T18:46:11Z) - Corpus Poisoning via Approximate Greedy Gradient Descent [48.5847914481222]
We propose Approximate Greedy Gradient Descent, a new attack on dense retrieval systems based on the widely used HotFlip method for generating adversarial passages.
We show that our method achieves a high attack success rate on several datasets and using several retrievers, and can generalize to unseen queries and new domains.
arXiv Detail & Related papers (2024-06-07T17:02:35Z) - ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text.
Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z) - Whispers in Grammars: Injecting Covert Backdoors to Compromise Dense Retrieval Systems [40.131588857153275]
This paper investigates a novel attack scenario where the attackers aim to mislead the retrieval system into retrieving the attacker-specified contents.<n>Those contents, injected into the retrieval corpus by attackers, can include harmful text like hate speech or spam.<n>Unlike prior methods that rely on model weights and generate conspicuous, unnatural outputs, we propose a covert backdoor attack triggered by grammar errors.
arXiv Detail & Related papers (2024-02-21T05:03:07Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z) - You Can Backdoor Personalized Federated Learning [18.91908598410108]
Existing research primarily focuses on backdoor attacks and defenses within the generic federated learning scenario.
We propose a two-pronged attack method, BapFL, which comprises two simple yet effective strategies.
arXiv Detail & Related papers (2023-07-29T12:25:04Z) - Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference.
Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Improving the Transferability of Adversarial Examples with New Iteration
Framework and Input Dropout [8.24029748310858]
We propose a new gradient iteration framework, which redefines the relationship between the iteration step size, the number of perturbations, and the maximum iterations.
Under this framework, we easily improve the attack success rate of DI-TI-MIM.
In addition, we propose a gradient iterative attack method based on input dropout, which can be well combined with our framework.
arXiv Detail & Related papers (2021-06-03T06:36:38Z) - Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples.
Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.