Pushing Boundaries: Mixup's Influence on Neural Collapse
- URL: http://arxiv.org/abs/2402.06171v1
- Date: Fri, 9 Feb 2024 04:01:25 GMT
- Title: Pushing Boundaries: Mixup's Influence on Neural Collapse
- Authors: Quinn Fisher, Haoming Meng, Vardan Papyan
- Abstract summary: Mixup is a data augmentation strategy that employs convex combinations of training instances and their respective labels to augment the robustness and calibration of deep neural networks.
This study investigates the last-layer activations of training data for deep networks subjected to mixup.
We show that mixup's last-layer activations predominantly converge to a distinctive configuration different than one might expect.
- Score: 3.6919724596215615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixup is a data augmentation strategy that employs convex combinations of
training instances and their respective labels to augment the robustness and
calibration of deep neural networks. Despite its widespread adoption, the
nuanced mechanisms that underpin its success are not entirely understood. The
observed phenomenon of Neural Collapse, where the last-layer activations and
classifier of deep networks converge to a simplex equiangular tight frame
(ETF), provides a compelling motivation to explore whether mixup induces
alternative geometric configurations and whether those could explain its
success. In this study, we delve into the last-layer activations of training
data for deep networks subjected to mixup, aiming to uncover insights into its
operational efficacy. Our investigation, spanning various architectures and
dataset pairs, reveals that mixup's last-layer activations predominantly
converge to a distinctive configuration different than one might expect. In
this configuration, activations from mixed-up examples of identical classes
align with the classifier, while those from different classes delineate
channels along the decision boundary. Moreover, activations in earlier layers
exhibit patterns, as if trained with manifold mixup. These findings are
unexpected, as mixed-up features are not simple convex combinations of feature
class means (as one might get, for example, by training mixup with the mean
squared error loss). By analyzing this distinctive geometric configuration, we
elucidate the mechanisms by which mixup enhances model calibration. To further
validate our empirical observations, we conduct a theoretical analysis under
the assumption of an unconstrained features model, utilizing the mixup loss.
Through this, we characterize and derive the optimal last-layer features under
the assumption that the classifier forms a simplex ETF.
Related papers
- Uncertainty estimation via ensembles of deep learning models and dropout layers for seismic traces [27.619194576741673]
We develop Convolutional Neural Networks (CNNs) to classify seismic waveforms based on first-motion polarity.
We constructed ensembles of networks to estimate uncertainty.
We observe that the uncertainty estimation ability of the ensembles of networks can be enhanced using dropout layers.
arXiv Detail & Related papers (2024-10-08T15:22:15Z) - Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings.
We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Ex uno plures: Splitting One Model into an Ensemble of Subnetworks [18.814965334083425]
We propose a strategy to compute an ensemble ofworks, each corresponding to a non-overlapping dropout mask computed via a pruning strategy and trained independently.
We show that the proposed subnetwork ensembling method can perform as well as standard deep ensembles in both accuracy and uncertainty estimates.
We experimentally demonstrate that subnetwork ensembling also consistently outperforms recently proposed approaches that efficiently ensemble neural networks.
arXiv Detail & Related papers (2021-06-09T01:49:49Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Analyzing Overfitting under Class Imbalance in Neural Networks for Image
Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures.
In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z) - DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.