Related papers: Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process

Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process

URL: http://arxiv.org/abs/2303.01234v2
Date: Wed, 27 Dec 2023 17:54:38 GMT
Title: Frauds Bargain Attack: Generating Adversarial Text Samples via Word Manipulation Process
Authors: Mingze Ni, Zhensu Sun and Wei Liu
Abstract summary: This study proposes a new method called the Fraud's Bargain Attack. It uses a randomization mechanism to expand the search space and produce high-quality adversarial examples. It outperforms other methods in terms of success rate, imperceptibility and sentence quality.
Score: 9.269657271777527
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research has revealed that natural language processing (NLP) models are vulnerable to adversarial examples. However, the current techniques for generating such examples rely on deterministic heuristic rules, which fail to produce optimal adversarial examples. In response, this study proposes a new method called the Fraud's Bargain Attack (FBA), which uses a randomization mechanism to expand the search space and produce high-quality adversarial examples with a higher probability of success. FBA uses the Metropolis-Hasting sampler, a type of Markov Chain Monte Carlo sampler, to improve the selection of adversarial examples from all candidates generated by a customized stochastic process called the Word Manipulation Process (WMP). The WMP method modifies individual words in a contextually-aware manner through insertion, removal, or substitution. Through extensive experiments, this study demonstrates that FBA outperforms other methods in terms of attack success rate, imperceptibility and sentence quality.

Related papers

Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.96286531087549]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z)
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers [10.063169009242682]
We train an encoder-decoder paraphrase model to generate adversarial examples. We adopt a reinforcement learning algorithm and propose a constraint-enforcing reward. We show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.
arXiv Detail & Related papers (2024-05-20T09:33:43Z)
Reversible Jump Attack to Textual Classifiers with Modification Reduction [8.247761405798874]
Reversible Jump Attack (RJA) and Metropolis-Hasting Modification Reduction (MMR) are proposed. RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.
arXiv Detail & Related papers (2024-03-21T04:54:31Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
Context-aware Adversarial Attack on Named Entity Recognition [15.049160192547909]
We study context-aware adversarial attack methods to examine the model's robustness. Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples. Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.
arXiv Detail & Related papers (2023-09-16T14:04:23Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks. We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z)
Phrase-level Adversarial Example Generation for Neural Machine Translation [75.01476479100569]
We propose a phrase-level adversarial example generation (PAEG) method to enhance the robustness of the model. We verify our method on three benchmarks, including LDC Chinese-English, IWSLT14 German-English, and WMT14 English-German tasks.
arXiv Detail & Related papers (2022-01-06T11:00:49Z)
Randomized Substitution and Vote for Textual Adversarial Example Detection [6.664295299367366]
A line of work has shown that natural text processing models are vulnerable to adversarial examples. We propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS&V) Empirical evaluations on three benchmark datasets demonstrate that RS&V could detect the textual adversarial examples more successfully than the existing detection methods.
arXiv Detail & Related papers (2021-09-13T04:17:58Z)
Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.