TAVAT: Token-Aware Virtual Adversarial Training for Language
Understanding
- URL: http://arxiv.org/abs/2004.14543v3
- Date: Fri, 4 Dec 2020 13:08:56 GMT
- Title: TAVAT: Token-Aware Virtual Adversarial Training for Language
Understanding
- Authors: Linyang Li, Xipeng Qiu
- Abstract summary: Gradient-based adversarial training is widely used in improving the robustness of neural networks.
It cannot be easily adapted to natural language processing tasks since the embedding space is discrete.
We propose a Token-Aware Virtual Adrial Training method to craft fine-grained perturbations.
- Score: 55.16953347580948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient-based adversarial training is widely used in improving the
robustness of neural networks, while it cannot be easily adapted to natural
language processing tasks since the embedding space is discrete. In natural
language processing fields, virtual adversarial training is introduced since
texts are discrete and cannot be perturbed by gradients directly.
Alternatively, virtual adversarial training, which generates perturbations on
the embedding space, is introduced in NLP tasks. Despite its success, existing
virtual adversarial training methods generate perturbations roughly constrained
by Frobenius normalization balls. To craft fine-grained perturbations, we
propose a Token-Aware Virtual Adversarial Training method. We introduce a
token-level accumulated perturbation vocabulary to initialize the perturbations
better and use a token-level normalization ball to constrain these
perturbations pertinently. Experiments show that our method improves the
performance of pre-trained models such as BERT and ALBERT in various tasks by a
considerable margin. The proposed method improves the score of the GLUE
benchmark from 78.3 to 80.9 using BERT model and it also enhances the
performance of sequence labeling and text classification tasks.
Related papers
- Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Latent Policies for Adversarial Imitation Learning [21.105328282702885]
This paper considers learning robot locomotion and manipulation tasks from expert demonstrations.
Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent.
A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems.
arXiv Detail & Related papers (2022-06-22T18:06:26Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Context-based Virtual Adversarial Training for Text Classification with
Noisy Labels [1.9508698179748525]
We propose context-based virtual adversarial training (ConVAT) to prevent a text classifier from overfitting to noisy labels.
Unlike the previous works, the proposed method performs the adversarial training at the context level rather than the inputs.
We conduct extensive experiments on four text classification datasets with two types of label noises.
arXiv Detail & Related papers (2022-05-29T14:19:49Z) - On Sensitivity of Deep Learning Based Text Classification Algorithms to
Practical Input Perturbations [0.0]
We evaluate the impact of systematic practical perturbations on the performance of deep learning based text classification models.
The perturbations are induced by the addition and removal of unwanted tokens like punctuation and stop-words.
We show that these deep learning approaches including BERT are sensitive to such legitimate input perturbations on four standard benchmark datasets.
arXiv Detail & Related papers (2022-01-02T08:33:49Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Adversarial Training with Contrastive Learning in NLP [0.0]
We propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task.
The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning.
The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks.
arXiv Detail & Related papers (2021-09-19T07:23:45Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Consistency Training with Virtual Adversarial Discrete Perturbation [17.311821099484987]
We propose an effective consistency training framework that enforces a training model's predictions given original and perturbed inputs to be similar.
This virtual adversarial discrete noise obtained by replacing a small portion of tokens efficiently pushes a training model's decision boundary.
arXiv Detail & Related papers (2021-04-15T07:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.