Disentangled Text Representation Learning with Information-Theoretic
Perspective for Adversarial Robustness
- URL: http://arxiv.org/abs/2210.14957v1
- Date: Wed, 26 Oct 2022 18:14:39 GMT
- Title: Disentangled Text Representation Learning with Information-Theoretic
Perspective for Adversarial Robustness
- Authors: Jiahao Zhao, Wenji Mao
- Abstract summary: Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems.
Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training.
In this paper, we tackle the adversarial challenge from the view of disentangled representation learning.
- Score: 17.5771010094384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial vulnerability remains a major obstacle to constructing reliable
NLP systems. When imperceptible perturbations are added to raw input text, the
performance of a deep learning model may drop dramatically under attacks.
Recent work argues the adversarial vulnerability of the model is caused by the
non-robust features in supervised training. Thus in this paper, we tackle the
adversarial robustness challenge from the view of disentangled representation
learning, which is able to explicitly disentangle robust and non-robust
features in text. Specifically, inspired by the variation of information (VI)
in information theory, we derive a disentangled learning objective composed of
mutual information to represent both the semantic representativeness of latent
embeddings and differentiation of robust and non-robust features. On the basis
of this, we design a disentangled learning network to estimate these mutual
information. Experiments on text classification and entailment tasks show that
our method significantly outperforms the representative methods under
adversarial attacks, indicating that discarding non-robust features is critical
for improving adversarial robustness.
Related papers
- Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory.
We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z) - Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.
Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision.
We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z) - Exploiting Explainability to Design Adversarial Attacks and Evaluate
Attack Resilience in Hate-Speech Detection Models [0.47334880432883714]
We present an analysis of adversarial robustness exhibited by various hate-speech detection models.
We devise and execute targeted attacks on the text by leveraging the TextAttack tool.
This work paves the way for creating more robust and reliable hate-speech detection systems.
arXiv Detail & Related papers (2023-05-29T19:59:40Z) - FRSUM: Towards Faithful Abstractive Summarization via Enhancing Factual
Robustness [56.263482420177915]
We study the faithfulness of existing systems from a new perspective of factual robustness.
We propose a novel training strategy, namely FRSUM, which teaches the model to defend against both explicit adversarial samples and implicit factual adversarial perturbations.
arXiv Detail & Related papers (2022-11-01T06:09:00Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Evaluating Deception Detection Model Robustness To Linguistic Variation [10.131671217810581]
We propose an analysis of model robustness against linguistic variation in the setting of deceptive news detection.
We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance.
We find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.
arXiv Detail & Related papers (2021-04-23T17:25:38Z) - Improving adversarial robustness of deep neural networks by using
semantic information [17.887586209038968]
Adrial training is the main method for improving adversarial robustness and the first line of defense against adversarial attacks.
This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class.
Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even using a very small dataset from the training data.
arXiv Detail & Related papers (2020-08-18T10:23:57Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Learning Adversarially Robust Representations via Worst-Case Mutual
Information Maximization [15.087280646796527]
Training machine learning models that are robust against adversarial inputs poses seemingly insurmountable challenges.
We develop a notion of representation vulnerability that captures the maximum change of mutual information between the input and output distributions.
We propose an unsupervised learning method for obtaining intrinsically robust representations by maximizing the worst-case mutual information.
arXiv Detail & Related papers (2020-02-26T21:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.