Towards Imperceptible Document Manipulations against Neural Ranking
Models
- URL: http://arxiv.org/abs/2305.01860v1
- Date: Wed, 3 May 2023 02:09:29 GMT
- Title: Towards Imperceptible Document Manipulations against Neural Ranking
Models
- Authors: Xuanang Chen, Ben He, Zheng Ye, Le Sun, Yingfei Sun
- Abstract summary: We propose a framework called Imperceptible DocumEnt Manipulation (IDEM) to produce adversarial documents.
IDEM instructs a well-established generative language model, such as BART, to generate connection sentences without introducing easy-to-detect errors.
We show that IDEM can outperform strong baselines while preserving fluency and correctness of the target documents.
- Score: 13.777462017782659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial attacks have gained traction in order to identify potential
vulnerabilities in neural ranking models (NRMs), but current attack methods
often introduce grammatical errors, nonsensical expressions, or incoherent text
fragments, which can be easily detected. Additionally, current methods rely
heavily on the use of a well-imitated surrogate NRM to guarantee the attack
effect, which makes them difficult to use in practice. To address these issues,
we propose a framework called Imperceptible DocumEnt Manipulation (IDEM) to
produce adversarial documents that are less noticeable to both algorithms and
humans. IDEM instructs a well-established generative language model, such as
BART, to generate connection sentences without introducing easy-to-detect
errors, and employs a separate position-wise merging strategy to balance
relevance and coherence of the perturbed text. Experimental results on the
popular MS MARCO benchmark demonstrate that IDEM can outperform strong
baselines while preserving fluency and correctness of the target documents as
evidenced by automatic and human evaluations. Furthermore, the separation of
adversarial text generation from the surrogate NRM makes IDEM more robust and
less affected by the quality of the surrogate NRM.
Related papers
- Are AI-Generated Text Detectors Robust to Adversarial Perturbations? [9.001160538237372]
Current detectors for AI-generated text (AIGT) lack robustness against adversarial perturbations.
This paper investigates the robustness of existing AIGT detection methods and introduces a novel detector, the Siamese Calibrated Reconstruction Network (SCRN)
The SCRN employs a reconstruction network to add and remove noise from text, extracting a semantic representation that is robust to local perturbations.
arXiv Detail & Related papers (2024-06-03T10:21:48Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z) - Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and
Baseline via Detection [12.244543468021938]
This paper introduces two types of detection tasks for adversarial documents.
A benchmark dataset is established to facilitate the investigation of adversarial ranking defense.
A comprehensive investigation of the performance of several detection baselines is conducted.
arXiv Detail & Related papers (2023-07-31T16:31:24Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
Text Generation [20.27052525082402]
We present a Controlled Adversarial Text Generation (CAT-Gen) model that generates adversarial texts through controllable attributes.
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts.
arXiv Detail & Related papers (2020-10-05T21:07:45Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.