TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven
Optimization
- URL: http://arxiv.org/abs/2212.09254v1
- Date: Mon, 19 Dec 2022 05:55:58 GMT
- Title: TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven
Optimization
- Authors: Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia
Liu, Shiyu Chang
- Abstract summary: We propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP.
We develop an effective convex relaxation method to co-optimize the continuously-relaxed site selection and perturbation variables.
As a first-order attack generation method, TextGrad can be baked into adversarial training to further improve the robustness of NLP models.
- Score: 35.8795761863398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robustness evaluation against adversarial examples has become increasingly
important to unveil the trustworthiness of the prevailing deep models in
natural language processing (NLP). However, in contrast to the computer vision
domain where the first-order projected gradient descent (PGD) is used as the
benchmark approach to generate adversarial examples for robustness evaluation,
there lacks a principled first-order gradient-based robustness evaluation
framework in NLP. The emerging optimization challenges lie in 1) the discrete
nature of textual inputs together with the strong coupling between the
perturbation location and the actual content, and 2) the additional constraint
that the perturbed text should be fluent and achieve a low perplexity under a
language model. These challenges make the development of PGD-like NLP attacks
difficult. To bridge the gap, we propose TextGrad, a new attack generator using
gradient-driven optimization, supporting high-accuracy and high-quality
assessment of adversarial robustness in NLP. Specifically, we address the
aforementioned challenges in a unified optimization framework. And we develop
an effective convex relaxation method to co-optimize the continuously-relaxed
site selection and perturbation variables and leverage an effective sampling
method to establish an accurate mapping from the continuous optimization
variables to the discrete textual perturbations. Moreover, as a first-order
attack generation method, TextGrad can be baked into adversarial training to
further improve the robustness of NLP models. Extensive experiments are
provided to demonstrate the effectiveness of TextGrad not only in attack
generation for robustness evaluation but also in adversarial defense.
Related papers
- Adversarial Text Generation with Dynamic Contextual Perturbation [0.0]
Adversarial attacks on Natural Language Processing (NLP) models expose vulnerabilities by introducing subtle perturbations to input text.<n>We propose a novel adversarial text attack scheme named Dynamic Contextual Perturbation (DCP)<n>DCP generates context-aware perturbations across sentences, paragraphs, and documents, ensuring semantic fidelity and fluency.
arXiv Detail & Related papers (2025-06-10T18:02:37Z) - T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation [60.620408007636016]
We propose T2I-Eval-R1, a novel reinforcement learning framework that trains open-source MLLMs using only coarse-grained quality scores.<n>Our approach integrates Group Relative Policy Optimization into the instruction-tuning process, enabling models to generate both scalar scores and interpretable reasoning chains.
arXiv Detail & Related papers (2025-05-23T13:44:59Z) - Positive Text Reframing under Multi-strategy Optimization [2.6345343328000856]
We propose a framework to generate fluent, diverse and task-constrained reframing text.
Our framework achieves significant improvements on unconstrained and controlled positive reframing tasks.
arXiv Detail & Related papers (2024-07-25T10:58:42Z) - STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario [50.37501379058119]
We propose the Spatial Transform Black-box Attack (STBA) to craft formidable adversarial examples in the query-limited scenario.
We show that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
arXiv Detail & Related papers (2024-03-30T13:28:53Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Revisiting and Advancing Fast Adversarial Training Through The Lens of
Bi-Level Optimization [60.72410937614299]
We propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Bi-level AT (FAST-BAT)
FAST-BAT is capable of defending sign-based projected descent (PGD) attacks without calling any gradient sign method and explicit robust regularization.
arXiv Detail & Related papers (2021-12-23T06:25:36Z) - On the Convergence and Robustness of Adversarial Training [134.25999006326916]
Adrial training with Project Gradient Decent (PGD) is amongst the most effective.
We propose a textitdynamic training strategy to increase the convergence quality of the generated adversarial examples.
Our theoretical and empirical results show the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-12-15T17:54:08Z) - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial
Attack Framework [17.17479625646699]
We propose a unified framework to craft textual adversarial samples.
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD)
arXiv Detail & Related papers (2021-10-28T17:31:51Z) - Learning to Selectively Learn for Weakly-supervised Paraphrase
Generation [81.65399115750054]
We propose a novel approach to generate high-quality paraphrases with weak supervision data.
Specifically, we tackle the weakly-supervised paraphrase generation problem by:.
obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion.
We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
arXiv Detail & Related papers (2021-09-25T23:31:13Z) - Efficient Combinatorial Optimization for Word-level Adversarial Textual
Attack [26.91645793706187]
Various word-level textual attack approaches have been proposed to reveal the vulnerability of deep neural networks used in natural language processing.
We propose an efficient local search algorithm (LS) to solve the problem in general cases.
We show that LS can largely reduce the number of queries usually by an order of magnitude to achieve high attack success rates.
arXiv Detail & Related papers (2021-09-06T03:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.