Related papers: AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer

AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer

URL: http://arxiv.org/abs/2309.12689v3
Date: Wed, 8 May 2024 11:14:41 GMT
Title: AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer
Authors: Leixin Yang, Yu Xiang,
Abstract summary: AMPLIFY uses the Attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results. The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other Mixup methods in text classification tasks.
Score: 2.3072402651280517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixup is an effective data augmentation method that generates new augmented samples by aggregating linear combinations of different original samples. However, if there are noises or aberrant features in the original samples, Mixup may propagate them to the augmented samples, leading to over-sensitivity of the model to these outliers . To solve this problem, this paper proposes a new Mixup method called AMPLIFY. This method uses the Attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results, without increasing additional trainable parameters, and the computational cost is very low, thereby avoiding the problem of high resource consumption in common Mixup methods such as Sentence Mixup . The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other Mixup methods in text classification tasks on 7 benchmark datasets, providing new ideas and new ways to further improve the performance of pre-trained models based on the Attention mechanism, such as BERT, ALBERT, RoBERTa, and GPT. Our code can be obtained at https://github.com/kiwi-lilo/AMPLIFY.

Related papers

TransformMix: Learning Transformation and Mixing Strategies from Data [20.79680733590554]
We propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings.
arXiv Detail & Related papers (2024-03-19T04:36:41Z)
Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing. We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z)
DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix. DoubleMix first generates several perturbed samples for each training data. It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z)
Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data. In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM) DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z)
Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample. We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z)
An Empirical Study of the Effects of Sample-Mixing Methods for Efficient Training of Generative Adversarial Networks [0.0]
It is well-known that training of generative adversarial networks (GANs) requires huge iterations before the generator's providing good-quality samples. We investigated the effect of sample mixing methods, that is, Mixup, CutMix, and SRMix, to alleviate this problem.
arXiv Detail & Related papers (2021-04-08T06:40:23Z)
ReMix: Towards Image-to-Image Translation with Limited Data [154.71724970593036]
We propose a data augmentation method (ReMix) to tackle this issue. We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples. The proposed approach effectively reduces the ambiguity of generation and renders content-preserving results.
arXiv Detail & Related papers (2021-03-31T06:24:10Z)
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data. We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch. Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z)
Recursive Inference for Variational Autoencoders [34.552283758419506]
Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized. Recent semi-amortized approaches were proposed to address this drawback. We introduce an accurate amortized inference algorithm.
arXiv Detail & Related papers (2020-11-17T10:22:12Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.