Tailoring Mixup to Data for Calibration
- URL: http://arxiv.org/abs/2311.01434v3
- Date: Tue, 18 Mar 2025 21:28:33 GMT
- Title: Tailoring Mixup to Data for Calibration
- Authors: Quentin Bouniot, Pavlo Mozharovskyi, Florence d'Alché-Buc,
- Abstract summary: We show that the likelihood of assigning a wrong label with mixup increases with the distance between data to mix.<n>We propose to dynamically change the underlying distributions of coefficients depending on the similarity between samples to mix.
- Score: 12.050401897136501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved predictive performance, Mixup is also a good technique for improving calibration. However, mixing data carelessly can lead to manifold mismatch, i.e., synthetic data lying outside original class manifolds, which can deteriorate calibration. In this work, we show that the likelihood of assigning a wrong label with mixup increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves predictive performance and calibration of models, while being much more efficient.
Related papers
- Beyond One-Hot Labels: Semantic Mixing for Model Calibration [22.39558434131574]
We introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty.
We propose calibrated reannotation to tackle the misalignment between the annotated confidence score and the mixing ratio.
Experimental results demonstrate that CSM achieves superior calibration compared to the state-of-the-art calibration approaches.
arXiv Detail & Related papers (2025-04-18T08:26:18Z) - Aioli: A Unified Optimization Framework for Language Model Data Mixing [74.50480703834508]
We show that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity.
We derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions.
arXiv Detail & Related papers (2024-11-08T17:50:24Z) - SUMix: Mixup with Semantic and Uncertain Information [41.99721365685618]
Mixup data augmentation approaches have been applied for various tasks of deep learning.
We propose a novel approach named SUMix to learn the mixing ratio as well as the uncertainty for the mixed samples during the training process.
arXiv Detail & Related papers (2024-07-10T16:25:26Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - DualMix: Unleashing the Potential of Data Augmentation for Online
Class-Incremental Learning [14.194817677415065]
We show that augmented samples with lower correlation to the original data are more effective in preventing forgetting.
We propose the Enhanced Mixup (EnMix) method that mixes the augmented samples and their labels simultaneously.
To solve the class imbalance problem, we design an Adaptive Mixup (AdpMix) method to calibrate the decision boundaries.
arXiv Detail & Related papers (2023-03-14T12:55:42Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - C-Mixup: Improving Generalization in Regression [71.10418219781575]
Mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels.
We propose C-Mixup, which adjusts the sampling probability based on the similarity of the labels.
C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively.
arXiv Detail & Related papers (2022-10-11T20:39:38Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - MixRL: Data Mixing Augmentation for Regression using Reinforcement
Learning [2.1345682889327837]
Existing techniques for data augmentation largely focus on classification tasks and do not readily apply to regression tasks.
We show that mixing examples that either have a large data or label distance may have an increasingly-negative effect on model performance.
We propose MixRL, a data augmentation meta learning framework for regression that learns for each example how many nearest neighbors it should be mixed with for the best model performance.
arXiv Detail & Related papers (2021-06-07T07:01:39Z) - When and How Mixup Improves Calibration [19.11486078732542]
In many machine learning applications, it is important for the model to provide confidence scores that accurately captures its prediction uncertainty.
In this paper, we theoretically prove that Mixup improves calibration in textithigh-dimensional settings by investigating two natural data models.
While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup trainings this issue and provably improves calibration.
arXiv Detail & Related papers (2021-02-11T22:24:54Z) - Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data.
We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch.
Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.