Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks
- URL: http://arxiv.org/abs/2010.02394v2
- Date: Tue, 10 Nov 2020 23:51:24 GMT
- Title: Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks
- Authors: Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip S. Yu,
Lifang He
- Abstract summary: Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
- Score: 75.69896269357005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixup is the latest data augmentation technique that linearly interpolates
input examples and the corresponding labels. It has shown strong effectiveness
in image classification by interpolating images at the pixel level. Inspired by
this line of research, in this paper, we explore i) how to apply mixup to
natural language processing tasks since text data can hardly be mixed in the
raw format; ii) if mixup is still effective in transformer-based learning
models, e.g., BERT. To achieve the goal, we incorporate mixup to
transformer-based pre-trained architecture, named "mixup-transformer", for a
wide range of NLP tasks while keeping the whole end-to-end training system. We
evaluate the proposed framework by running extensive experiments on the GLUE
benchmark. Furthermore, we also examine the performance of mixup-transformer in
low-resource scenarios by reducing the training data with a certain ratio. Our
studies show that mixup is a domain-independent data augmentation technique to
pre-trained language models, resulting in significant performance improvement
for transformer-based models.
Related papers
- Heterogeneous Federated Learning with Splited Language Model [22.65325348176366]
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice.
In this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness.
We are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits.
arXiv Detail & Related papers (2024-03-24T07:33:08Z) - TransformMix: Learning Transformation and Mixing Strategies from Data [20.79680733590554]
We propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data.
We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings.
arXiv Detail & Related papers (2024-03-19T04:36:41Z) - Adversarial AutoMixup [50.1874436169571]
We propose AdAutomixup, an adversarial automatic mixup augmentation approach.
It generates challenging samples to train a robust classifier for image classification.
Our approach outperforms the state of the art in various classification scenarios.
arXiv Detail & Related papers (2023-12-19T08:55:00Z) - TiMix: Text-aware Image Mixing for Effective Vision-Language
Pre-training [42.142924806184425]
Mixed data samples for cross-modal contrastive learning implicitly serve as a regularizer for the contrastive loss.
TiMix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods.
arXiv Detail & Related papers (2023-12-14T12:02:24Z) - DP-Mix: Mixup-based Data Augmentation for Differentially Private
Learning [10.971246386083884]
We propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning.
Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data.
Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process.
arXiv Detail & Related papers (2023-11-02T15:12:12Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - MixUp Training Leads to Reduced Overfitting and Improved Calibration for
the Transformer Architecture [0.0]
MixUp is a computer vision data augmentation technique that uses convex generalizations of input data and their labels to enhance model during training.
In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer, and apply them to finetune the BERT model for a diverse set of NLU tasks.
We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50%.
arXiv Detail & Related papers (2021-02-22T23:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.