HiddenCut: Simple Data Augmentation for Natural Language Understanding
with Better Generalization
- URL: http://arxiv.org/abs/2106.00149v1
- Date: Mon, 31 May 2021 23:57:43 GMT
- Title: HiddenCut: Simple Data Augmentation for Natural Language Understanding
with Better Generalization
- Authors: Jiaao Chen, Dinghan Shen, Weizhu Chen, Diyi Yang
- Abstract summary: Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP.
We propose a simple yet effective data augmentation technique, HiddenCut, to better regularize the model and encourage it to learn more generalizable features.
Experiments show that our HiddenCut method outperforms the state-of-the-art augmentation methods on the GLUE benchmark.
- Score: 36.36061533271373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning large pre-trained models with task-specific data has achieved
great success in NLP. However, it has been demonstrated that the majority of
information within the self-attention networks is redundant and not utilized
effectively during the fine-tuning stage. This leads to inferior results when
generalizing the obtained models to out-of-domain distributions. To this end,
we propose a simple yet effective data augmentation technique, HiddenCut, to
better regularize the model and encourage it to learn more generalizable
features. Specifically, contiguous spans within the hidden space are
dynamically and strategically dropped during training. Experiments show that
our HiddenCut method outperforms the state-of-the-art augmentation methods on
the GLUE benchmark, and consistently exhibits superior generalization
performances on out-of-distribution and challenging counterexamples. We have
publicly released our code at https://github.com/GT-SALT/HiddenCut.
Related papers
- Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning [7.448831299106425]
DISGEN is a model-agnostic framework designed to disentangle size factors from graph representations.
Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets.
arXiv Detail & Related papers (2024-06-07T03:19:24Z) - TED: Accelerate Model Training by Internal Generalization [19.336762953352956]
Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes.
We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data.
arXiv Detail & Related papers (2024-05-06T07:40:13Z) - SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex.
We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold.
We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z) - Adversarial Style Augmentation for Domain Generalization [41.72506801753435]
We introduce a novel Adrial Style Augmentation (ASA) method, which explores broader style spaces by generating more effective statistics perturbation.
To facilitate the application of ASA, we design a simple yet effective module, namely AdvStyle, which instantiates the ASA method in a plug-and-play manner.
Our method significantly outperforms its competitors on the PACS dataset under the single source generalization setting.
arXiv Detail & Related papers (2023-01-30T03:52:16Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - Combining Label Propagation and Simple Models Out-performs Graph Neural
Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs.
We call this overall procedure Correct and Smooth (C&S)
Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.