Related papers: MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

URL: http://arxiv.org/abs/2102.11402v1
Date: Mon, 22 Feb 2021 23:12:35 GMT
Title: MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture
Authors: Wancong Zhang, Ieshan Vaidya
Abstract summary: MixUp is a computer vision data augmentation technique that uses convex generalizations of input data and their labels to enhance model during training. In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer, and apply them to finetune the BERT model for a diverse set of NLU tasks. We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: MixUp is a computer vision data augmentation technique that uses convex interpolations of input data and their labels to enhance model generalization during training. However, the application of MixUp to the natural language understanding (NLU) domain has been limited, due to the difficulty of interpolating text directly in the input space. In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer architecture, and apply them to finetune the BERT model for a diverse set of NLU tasks. We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50%.

Related papers

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms. We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law. Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z)
Tailoring Mixup to Data for Calibration [12.050401897136501]
Mixup is a technique for improving calibration and predictive uncertainty. In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix. We propose to dynamically change the underlying distributions of coefficients depending on the similarity between samples to mix.
arXiv Detail & Related papers (2023-11-02T17:48:28Z)
MixupE: Understanding and Improving Mixup from Directional Derivative Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z)
DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix. DoubleMix first generates several perturbed samples for each training data. It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z)
On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency [47.90235939359225]
We propose a novel mixup strategy for pre-trained language models that improves model calibration further. Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples.
arXiv Detail & Related papers (2022-03-14T23:45:08Z)
Preventing Manifold Intrusion with Locality: Local Mixup [10.358087436626391]
Mixup is a data-dependent regularization technique that consists in linearly interpolating input samples and associated outputs. In this paper, we introduce Local Mixup in which distant input samples are weighted down when computing the loss.
arXiv Detail & Related papers (2022-01-12T09:05:53Z)
ReMix: Towards Image-to-Image Translation with Limited Data [154.71724970593036]
We propose a data augmentation method (ReMix) to tackle this issue. We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples. The proposed approach effectively reduces the ambiguity of generation and renders content-preserving results.
arXiv Detail & Related papers (2021-03-31T06:24:10Z)
When and How Mixup Improves Calibration [19.11486078732542]
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty. In this paper, we theoretically prove that Mixup improves calibration in textithigh-dimensional settings by investigating two natural data models. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup trainings this issue and provably improves calibration.
arXiv Detail & Related papers (2021-02-11T22:24:54Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.