Pushing Boundaries: Mixup's Influence on Neural Collapse
- URL: http://arxiv.org/abs/2402.06171v1
- Date: Fri, 9 Feb 2024 04:01:25 GMT
- Title: Pushing Boundaries: Mixup's Influence on Neural Collapse
- Authors: Quinn Fisher, Haoming Meng, Vardan Papyan
- Abstract summary: Mixup is a data augmentation strategy that employs convex combinations of training instances and their respective labels to augment the robustness and calibration of deep neural networks.
This study investigates the last-layer activations of training data for deep networks subjected to mixup.
We show that mixup's last-layer activations predominantly converge to a distinctive configuration different than one might expect.
- Score: 3.6919724596215615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixup is a data augmentation strategy that employs convex combinations of
training instances and their respective labels to augment the robustness and
calibration of deep neural networks. Despite its widespread adoption, the
nuanced mechanisms that underpin its success are not entirely understood. The
observed phenomenon of Neural Collapse, where the last-layer activations and
classifier of deep networks converge to a simplex equiangular tight frame
(ETF), provides a compelling motivation to explore whether mixup induces
alternative geometric configurations and whether those could explain its
success. In this study, we delve into the last-layer activations of training
data for deep networks subjected to mixup, aiming to uncover insights into its
operational efficacy. Our investigation, spanning various architectures and
dataset pairs, reveals that mixup's last-layer activations predominantly
converge to a distinctive configuration different than one might expect. In
this configuration, activations from mixed-up examples of identical classes
align with the classifier, while those from different classes delineate
channels along the decision boundary. Moreover, activations in earlier layers
exhibit patterns, as if trained with manifold mixup. These findings are
unexpected, as mixed-up features are not simple convex combinations of feature
class means (as one might get, for example, by training mixup with the mean
squared error loss). By analyzing this distinctive geometric configuration, we
elucidate the mechanisms by which mixup enhances model calibration. To further
validate our empirical observations, we conduct a theoretical analysis under
the assumption of an unconstrained features model, utilizing the mixup loss.
Through this, we characterize and derive the optimal last-layer features under
the assumption that the classifier forms a simplex ETF.
Related papers
- Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Defensive Tensorization [113.96183766922393]
We propose tensor defensiveization, an adversarial defence technique that leverages a latent high-order factorization of the network.
We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks.
We validate the versatility of our approach across domains and low-precision architectures by considering an audio task and binary networks.
arXiv Detail & Related papers (2021-10-26T17:00:16Z) - Ex uno plures: Splitting One Model into an Ensemble of Subnetworks [18.814965334083425]
We propose a strategy to compute an ensemble ofworks, each corresponding to a non-overlapping dropout mask computed via a pruning strategy and trained independently.
We show that the proposed subnetwork ensembling method can perform as well as standard deep ensembles in both accuracy and uncertainty estimates.
We experimentally demonstrate that subnetwork ensembling also consistently outperforms recently proposed approaches that efficiently ensemble neural networks.
arXiv Detail & Related papers (2021-06-09T01:49:49Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets.
We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Analyzing Overfitting under Class Imbalance in Neural Networks for Image
Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures.
In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z) - Improving Adversarial Robustness by Enforcing Local and Global
Compactness [19.8818435601131]
Adversary training is the most successful method that consistently resists a wide range of attacks.
We propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption.
The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network.
arXiv Detail & Related papers (2020-07-10T00:43:06Z) - DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.