Related papers: When does mixup promote local linearity in learned representations?

When does mixup promote local linearity in learned representations?

URL: http://arxiv.org/abs/2210.16413v1
Date: Fri, 28 Oct 2022 21:27:33 GMT
Title: When does mixup promote local linearity in learned representations?
Authors: Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar
Abstract summary: We study the role of Mixup in promoting linearity in the learned network representations. We investigate these properties of Mixup on vision datasets such as CIFAR-10, CIFAR-100 and SVHN.
Score: 61.079020647847024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this paper, we look at Mixup through a \emph{representation learning} lens in a semi-supervised learning setup. In particular, we study the role of Mixup in promoting linearity in the learned network representations. Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?; and (2) how does the enforcement of stronger Mixup loss on more than two data points affect the convergence of training? We empirically investigate these properties of Mixup on vision datasets such as CIFAR-10, CIFAR-100 and SVHN. Our results show that supervised Mixup training does not make \emph{all} the network layers linear; in fact the \emph{intermediate layers} become more non-linear during Mixup training compared to a network that is trained \emph{without} Mixup. However, when Mixup is used as an unsupervised loss, we observe that all the network layers become more linear resulting in faster training convergence.

Related papers

The Benefits of Mixup for Feature Learning [117.93273337740442]
We first show that Mixup using different linear parameters for features and labels can still achieve similar performance to standard Mixup. We consider a feature-noise data model and show that Mixup training can effectively learn the rare features from its mixture with the common features. In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad performance.
arXiv Detail & Related papers (2023-03-15T08:11:47Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup [54.09898347820941]
We propose textbfScenario-textbfAgnostic textbfMixup (SAMix) for both Self-supervised Learning (SSL) and supervised learning (SL) scenarios. Specifically, we hypothesize and verify the objective function of mixup generation as optimizing local smoothness between two mixed classes. A label-free generation sub-network is designed, which effectively provides non-trivial mixup samples and improves transferable abilities.
arXiv Detail & Related papers (2021-11-30T14:49:59Z)
Towards Understanding the Data Dependency of Mixup-style Training [14.803285140800542]
In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk. For a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.
arXiv Detail & Related papers (2021-10-14T18:13:57Z)
SMILE: Self-Distilled MIxup for Efficient Transfer LEarning [42.59451803498095]
In this work, we propose SMILE - Self-Distilled Mixup for EffIcient Transfer LEarning. With mixed images as inputs, SMILE regularizes the outputs of CNN feature extractors to learn from the mixed feature vectors of inputs. The triple regularizer balances the mixup effects in both feature and label spaces while bounding the linearity in-between samples for pre-training tasks.
arXiv Detail & Related papers (2021-03-25T16:02:21Z)
AutoMix: Unveiling the Power of Mixup [34.623943038648164]
We present a flexible, general Automatic Mixup framework which utilizes discriminative features to learn a sample mixing policy adaptively. We regard mixup as a pretext task and split it into two sub-problems: mixed samples generation and mixup classification. Experiments on six popular classification benchmarks show that AutoMix consistently outperforms other leading mixup methods.
arXiv Detail & Related papers (2021-03-24T07:21:53Z)
Mixup Without Hesitation [38.801366276601414]
We propose mixup Without hesitation (mWh), a concise, effective, and easy-to-use training algorithm. mWh strikes a good balance between exploration and exploitation by gradually replacing mixup with basic data augmentation. Our code is open-source and available at https://github.com/yuhao318318/mWh.
arXiv Detail & Related papers (2021-01-12T08:11:08Z)
DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z)
Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy [77.34280933613226]
We propose a general regularizer called textbfPatch-level Neighborhood Interpolation(Pani) that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy.
arXiv Detail & Related papers (2019-11-21T06:31:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.