Related papers: TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding

TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding

URL: http://arxiv.org/abs/2004.14543v3
Date: Fri, 4 Dec 2020 13:08:56 GMT
Title: TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding
Authors: Linyang Li, Xipeng Qiu
Abstract summary: Gradient-based adversarial training is widely used in improving the robustness of neural networks. It cannot be easily adapted to natural language processing tasks since the embedding space is discrete. We propose a Token-Aware Virtual Adrial Training method to craft fine-grained perturbations.
Score: 55.16953347580948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient-based adversarial training is widely used in improving the robustness of neural networks, while it cannot be easily adapted to natural language processing tasks since the embedding space is discrete. In natural language processing fields, virtual adversarial training is introduced since texts are discrete and cannot be perturbed by gradients directly. Alternatively, virtual adversarial training, which generates perturbations on the embedding space, is introduced in NLP tasks. Despite its success, existing virtual adversarial training methods generate perturbations roughly constrained by Frobenius normalization balls. To craft fine-grained perturbations, we propose a Token-Aware Virtual Adversarial Training method. We introduce a token-level accumulated perturbation vocabulary to initialize the perturbations better and use a token-level normalization ball to constrain these perturbations pertinently. Experiments show that our method improves the performance of pre-trained models such as BERT and ALBERT in various tasks by a considerable margin. The proposed method improves the score of the GLUE benchmark from 78.3 to 80.9 using BERT model and it also enhances the performance of sequence labeling and text classification tasks.

Related papers

ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining [53.893792844055106]
Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency.<n>We introduce Selective Efficient Language Modeling, a risk-aware algorithm that improves training efficiency and distributional robustness by performing online token-level batch selection.<n> Experiments on GPT-2 pretraining show that ESLM significantly reduces training FLOPs while maintaining or improving both perplexity and downstream performance compared to baselines.
arXiv Detail & Related papers (2025-05-26T12:23:26Z)
Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL) This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Latent Policies for Adversarial Imitation Learning [21.105328282702885]
This paper considers learning robot locomotion and manipulation tasks from expert demonstrations. Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent. A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems.
arXiv Detail & Related papers (2022-06-22T18:06:26Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Context-based Virtual Adversarial Training for Text Classification with Noisy Labels [1.9508698179748525]
We propose context-based virtual adversarial training (ConVAT) to prevent a text classifier from overfitting to noisy labels. Unlike the previous works, the proposed method performs the adversarial training at the context level rather than the inputs. We conduct extensive experiments on four text classification datasets with two types of label noises.
arXiv Detail & Related papers (2022-05-29T14:19:49Z)
On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations [0.0]
We evaluate the impact of systematic practical perturbations on the performance of deep learning based text classification models. The perturbations are induced by the addition and removal of unwanted tokens like punctuation and stop-words. We show that these deep learning approaches including BERT are sensitive to such legitimate input perturbations on four standard benchmark datasets.
arXiv Detail & Related papers (2022-01-02T08:33:49Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
Adversarial Training with Contrastive Learning in NLP [0.0]
We propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task. The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning. The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks.
arXiv Detail & Related papers (2021-09-19T07:23:45Z)
IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z)
Consistency Training with Virtual Adversarial Discrete Perturbation [17.311821099484987]
We propose an effective consistency training framework that enforces a training model's predictions given original and perturbed inputs to be similar. This virtual adversarial discrete noise obtained by replacing a small portion of tokens efficiently pushes a training model's decision boundary.
arXiv Detail & Related papers (2021-04-15T07:49:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.