Certified Robustness Against Natural Language Attacks by Causal
Intervention
- URL: http://arxiv.org/abs/2205.12331v2
- Date: Thu, 26 May 2022 09:30:53 GMT
- Title: Certified Robustness Against Natural Language Attacks by Causal
Intervention
- Authors: Haiteng Zhao, Chang Ma*, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng,
Hanwang Zhang
- Abstract summary: Causal Intervention by Semantic Smoothing (CISS) is a novel framework towards robustness against natural language attacks.
CISS is provably robust against word substitution attacks, as well as empirically robust even when perturbations are strengthened by unknown attack algorithms.
- Score: 61.62348826831147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have achieved great success in many fields, yet they are
vulnerable to adversarial examples. This paper follows a causal perspective to
look into the adversarial vulnerability and proposes Causal Intervention by
Semantic Smoothing (CISS), a novel framework towards robustness against natural
language attacks. Instead of merely fitting observational data, CISS learns
causal effects p(y|do(x)) by smoothing in the latent semantic space to make
robust predictions, which scales to deep architectures and avoids tedious
construction of noise customized for specific attacks. CISS is provably robust
against word substitution attacks, as well as empirically robust even when
perturbations are strengthened by unknown attack algorithms. For example, on
YELP, CISS surpasses the runner-up by 6.7% in terms of certified robustness
against word substitutions, and achieves 79.4% empirical robustness when
syntactic attacks are integrated.
Related papers
- A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models [6.854732863866882]
Speech emotion recognition (SER) is constantly gaining attention in recent years due to its potential applications in diverse fields.
Recent studies have shown that deep learning models can be vulnerable to adversarial attacks.
arXiv Detail & Related papers (2024-04-29T09:00:32Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - Fooling the Textual Fooler via Randomizing Latent Representations [13.77424820701913]
adversarial word-level perturbations are well-studied and effective attack strategies.
We propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example.
We empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks.
arXiv Detail & Related papers (2023-10-02T06:57:25Z) - Context-aware Adversarial Attack on Named Entity Recognition [15.049160192547909]
We study context-aware adversarial attack methods to examine the model's robustness.
Specifically, we propose perturbing the most informative words for recognizing entities to create adversarial examples.
Experiments and analyses show that our methods are more effective in deceiving the model into making wrong predictions than strong baselines.
arXiv Detail & Related papers (2023-09-16T14:04:23Z) - Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Towards Robustness Against Natural Language Word Substitutions [87.56898475512703]
Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions.
Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
arXiv Detail & Related papers (2021-07-28T17:55:08Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.