Efficient Combinatorial Optimization for Word-level Adversarial Textual
Attack
- URL: http://arxiv.org/abs/2109.02229v1
- Date: Mon, 6 Sep 2021 03:44:43 GMT
- Title: Efficient Combinatorial Optimization for Word-level Adversarial Textual
Attack
- Authors: Shengcai Liu, Ning Lu, Cheng Chen, Ke Tang
- Abstract summary: Various word-level textual attack approaches have been proposed to reveal the vulnerability of deep neural networks used in natural language processing.
We propose an efficient local search algorithm (LS) to solve the problem in general cases.
We show that LS can largely reduce the number of queries usually by an order of magnitude to achieve high attack success rates.
- Score: 26.91645793706187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past few years, various word-level textual attack approaches have
been proposed to reveal the vulnerability of deep neural networks used in
natural language processing. Typically, these approaches involve an important
optimization step to determine which substitute to be used for each word in the
original input. However, current research on this step is still rather limited,
from the perspectives of both problem-understanding and problem-solving. In
this paper, we address these issues by uncovering the theoretical properties of
the problem and proposing an efficient local search algorithm (LS) to solve it.
We establish the first provable approximation guarantee on solving the problem
in general cases. Notably, for adversarial textual attack, it is even better
than the previous bound which only holds in special case. Extensive experiments
involving five NLP tasks, six datasets and eleven NLP models show that LS can
largely reduce the number of queries usually by an order of magnitude to
achieve high attack success rates. Further experiments show that the
adversarial examples crafted by LS usually have higher quality, exhibit better
transferability, and can bring more robustness improvement to victim models by
adversarial training.
Related papers
- Depth Gives a False Sense of Privacy: LLM Internal States Inversion [17.639108495452785]
Large Language Models (LLMs) are increasingly integrated into daily routines, yet they raise significant privacy and safety concerns.<n>Recent research proposes collaborative inference, which outsources the early-layer inference to ensure data locality.<n>We propose four inversion attacks that significantly improve the semantic similarity and token matching rate of inverted inputs.
arXiv Detail & Related papers (2025-07-22T09:15:11Z) - ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models [67.75439511654078]
Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses.<n>They face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications.<n>We propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment.
arXiv Detail & Related papers (2025-07-01T16:01:08Z) - PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization [13.751251342738225]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications.
They also exhibit inherent limitations, such as outdated knowledge and susceptibility to hallucinations.
Recent efforts have focused on the security of RAG-based LLMs, yet existing attack methods face three critical challenges.
We propose coordinated Prompt-RAG attack (PR-attack), a novel optimization-driven attack that introduces a small number of poisoned texts into the knowledge database.
arXiv Detail & Related papers (2025-04-10T13:09:50Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven
Optimization [35.8795761863398]
We propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP.
We develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables.
As a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models.
arXiv Detail & Related papers (2022-12-19T05:55:58Z) - MaxMatch: Semi-Supervised Learning with Worst-Case Consistency [149.03760479533855]
We propose a worst-case consistency regularization technique for semi-supervised learning (SSL)
We present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately.
Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants.
arXiv Detail & Related papers (2022-09-26T12:04:49Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail
Problems [102.95119281306893]
We present an early trial to explore adversarial training methods to optimize AUC.
We reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function.
Our analysis differs from the existing studies since the algorithm is asked to generate adversarial examples by calculating the gradient of a min-max problem.
arXiv Detail & Related papers (2022-06-24T09:13:39Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
Attack Framework [17.17479625646699]
We propose a unified framework to craft textual adversarial samples.
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD)
arXiv Detail & Related papers (2021-10-28T17:31:51Z) - High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks.
For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points.
Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z) - A Comprehensive Study of Class Incremental Learning Algorithms for
Visual Tasks [11.230170401360633]
The ability of artificial agents to increment their capabilities when confronted with new data is an open challenge in artificial intelligence.
Main challenge is catastrophic forgetting, i.e., the tendency of neural networks to underfit past data when new ones are ingested.
We propose a common evaluation framework which is more thorough than existing ones in terms of number of datasets, size of datasets, size of bounded memory and number of incremental states.
arXiv Detail & Related papers (2020-11-03T16:59:21Z) - Robust Deep Learning as Optimal Control: Insights and Convergence
Guarantees [19.28405674700399]
adversarial examples during training is a popular defense mechanism against adversarial attacks.
By interpreting the min-max problem as an optimal control problem, it has been shown that one can exploit the compositional structure of neural networks.
We provide the first convergence analysis of this adversarial training algorithm by combining techniques from robust optimal control and inexact methods in optimization.
arXiv Detail & Related papers (2020-05-01T21:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.