Global Mixup: Eliminating Ambiguity with Clustering
- URL: http://arxiv.org/abs/2206.02734v1
- Date: Mon, 6 Jun 2022 16:42:22 GMT
- Title: Global Mixup: Eliminating Ambiguity with Clustering
- Authors: Xiangjin Xie and Yangning Li and Wang Chen and Kai Ouyang and Li Jiang
and Haitao Zheng
- Abstract summary: We propose a novel augmentation method based on global clustering relationships named textbfGlobal Mixup.
Experiments show that Global Mixup significantly outperforms previous state-of-the-art baselines.
- Score: 18.876583942942144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation with \textbf{Mixup} has been proven an effective method to
regularize the current deep neural networks. Mixup generates virtual samples
and corresponding labels at once through linear interpolation. However, this
one-stage generation paradigm and the use of linear interpolation have the
following two defects: (1) The label of the generated sample is directly
combined from the labels of the original sample pairs without reasonable
judgment, which makes the labels likely to be ambiguous. (2) linear combination
significantly limits the sampling space for generating samples. To tackle these
problems, we propose a novel and effective augmentation method based on global
clustering relationships named \textbf{Global Mixup}. Specifically, we
transform the previous one-stage augmentation process into two-stage,
decoupling the process of generating virtual samples from the labeling. And for
the labels of the generated samples, relabeling is performed based on
clustering by calculating the global relationships of the generated samples. In
addition, we are no longer limited to linear relationships but generate more
reliable virtual samples in a larger sampling space. Extensive experiments for
\textbf{CNN}, \textbf{LSTM}, and \textbf{BERT} on five tasks show that Global
Mixup significantly outperforms previous state-of-the-art baselines. Further
experiments also demonstrate the advantage of Global Mixup in low-resource
scenarios.
Related papers
- Mixup Augmentation with Multiple Interpolations [26.46413903248954]
We propose a simple yet effective extension called multi-mix, which generates multiple gradients from a sample pair.
With an ordered sequence of generated samples, multi-mix can better guide the training process than standard mixup.
arXiv Detail & Related papers (2024-06-03T15:16:09Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - On the Equivalence of Graph Convolution and Mixup [70.0121263465133]
This paper investigates the relationship between graph convolution and Mixup techniques.
Under two mild conditions, graph convolution can be viewed as a specialized form of Mixup.
We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup.
arXiv Detail & Related papers (2023-09-29T23:09:54Z) - Weighted Sparse Partial Least Squares for Joint Sample and Feature
Selection [7.219077740523681]
We propose an $ell_infty/ell_0$-norm constrained weighted sparse PLS ($ell_infty/ell_$-wsPLS) method for joint sample and feature selection.
We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property.
arXiv Detail & Related papers (2023-08-13T10:09:25Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Implicit Sample Extension for Unsupervised Person Re-Identification [97.46045935897608]
Clustering sometimes mixes different true identities together or splits the same identity into two or more sub clusters.
We propose an Implicit Sample Extension (OurWholeMethod) method to generate what we call support samples around the cluster boundaries.
Experiments demonstrate that the proposed method is effective and achieves state-of-the-art performance for unsupervised person Re-ID.
arXiv Detail & Related papers (2022-04-14T11:41:48Z) - Multi-Sample $\zeta$-mixup: Richer, More Realistic Synthetic Samples
from a $p$-Series Interpolant [16.65329510916639]
We propose $zeta$-mixup, a generalization of mixup with provably and demonstrably desirable properties.
We show that our implementation of $zeta$-mixup is faster than mixup, and extensive evaluation on controlled synthetic and 24 real-world natural and medical image classification datasets shows that $zeta$-mixup outperforms mixup and traditional data augmentation techniques.
arXiv Detail & Related papers (2022-04-07T09:41:09Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.