Related papers: Global Mixup: Eliminating Ambiguity with Clustering

Global Mixup: Eliminating Ambiguity with Clustering

URL: http://arxiv.org/abs/2206.02734v1
Date: Mon, 6 Jun 2022 16:42:22 GMT
Title: Global Mixup: Eliminating Ambiguity with Clustering
Authors: Xiangjin Xie and Yangning Li and Wang Chen and Kai Ouyang and Li Jiang and Haitao Zheng
Abstract summary: We propose a novel augmentation method based on global clustering relationships named textbfGlobal Mixup. Experiments show that Global Mixup significantly outperforms previous state-of-the-art baselines.
Score: 18.876583942942144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation with \textbf{Mixup} has been proven an effective method to regularize the current deep neural networks. Mixup generates virtual samples and corresponding labels at once through linear interpolation. However, this one-stage generation paradigm and the use of linear interpolation have the following two defects: (1) The label of the generated sample is directly combined from the labels of the original sample pairs without reasonable judgment, which makes the labels likely to be ambiguous. (2) linear combination significantly limits the sampling space for generating samples. To tackle these problems, we propose a novel and effective augmentation method based on global clustering relationships named \textbf{Global Mixup}. Specifically, we transform the previous one-stage augmentation process into two-stage, decoupling the process of generating virtual samples from the labeling. And for the labels of the generated samples, relabeling is performed based on clustering by calculating the global relationships of the generated samples. In addition, we are no longer limited to linear relationships but generate more reliable virtual samples in a larger sampling space. Extensive experiments for \textbf{CNN}, \textbf{LSTM}, and \textbf{BERT} on five tasks show that Global Mixup significantly outperforms previous state-of-the-art baselines. Further experiments also demonstrate the advantage of Global Mixup in low-resource scenarios.

Related papers

SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity [36.9096162214815]
Existing pretraining data mixing methods for large language models (LLMs) typically follow a domain-wise methodology. We propose a novel sample-wise data mixture approach based on a bottom-up paradigm.
arXiv Detail & Related papers (2025-03-03T13:22:11Z)
Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints [14.831346286039151]
One challenge in the cluster analysis of cells is matching clusters extracted from datasets of different origins or conditions. Our approach aims to construct a taxonomy for cell clusters across all samples to better annotate these clusters and effectively extract features for downstream analysis.
arXiv Detail & Related papers (2025-01-29T21:29:25Z)
Mixup Augmentation with Multiple Interpolations [26.46413903248954]
We propose a simple yet effective extension called multi-mix, which generates multiple gradients from a sample pair. With an ordered sequence of generated samples, multi-mix can better guide the training process than standard mixup.
arXiv Detail & Related papers (2024-06-03T15:16:09Z)
GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure. First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z)
On the Equivalence of Graph Convolution and Mixup [70.0121263465133]
This paper investigates the relationship between graph convolution and Mixup techniques. Under two mild conditions, graph convolution can be viewed as a specialized form of Mixup. We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup.
arXiv Detail & Related papers (2023-09-29T23:09:54Z)
Weighted Sparse Partial Least Squares for Joint Sample and Feature Selection [7.219077740523681]
We propose an $ell_infty/ell_0$-norm constrained weighted sparse PLS ($ell_infty/ell_$-wsPLS) method for joint sample and feature selection. We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property.
arXiv Detail & Related papers (2023-08-13T10:09:25Z)
DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix. DoubleMix first generates several perturbed samples for each training data. It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z)
Implicit Sample Extension for Unsupervised Person Re-Identification [97.46045935897608]
Clustering sometimes mixes different true identities together or splits the same identity into two or more sub clusters. We propose an Implicit Sample Extension (OurWholeMethod) method to generate what we call support samples around the cluster boundaries. Experiments demonstrate that the proposed method is effective and achieves state-of-the-art performance for unsupervised person Re-ID.
arXiv Detail & Related papers (2022-04-14T11:41:48Z)
Multi-Sample $\zeta$-mixup: Richer, More Realistic Synthetic Samples from a $p$-Series Interpolant [16.65329510916639]
We propose $zeta$-mixup, a generalization of mixup with provably and demonstrably desirable properties. We show that our implementation of $zeta$-mixup is faster than mixup, and extensive evaluation on controlled synthetic and 24 real-world natural and medical image classification datasets shows that $zeta$-mixup outperforms mixup and traditional data augmentation techniques.
arXiv Detail & Related papers (2022-04-07T09:41:09Z)
Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data. In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM) DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z)
Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample. We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.