Related papers: Pushing Boundaries: Mixup's Influence on Neural Collapse

Pushing Boundaries: Mixup's Influence on Neural Collapse

URL: http://arxiv.org/abs/2402.06171v1
Date: Fri, 9 Feb 2024 04:01:25 GMT
Title: Pushing Boundaries: Mixup's Influence on Neural Collapse
Authors: Quinn Fisher, Haoming Meng, Vardan Papyan
Abstract summary: Mixup is a data augmentation strategy that employs convex combinations of training instances and their respective labels to augment the robustness and calibration of deep neural networks. This study investigates the last-layer activations of training data for deep networks subjected to mixup. We show that mixup's last-layer activations predominantly converge to a distinctive configuration different than one might expect.
Score: 3.6919724596215615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixup is a data augmentation strategy that employs convex combinations of training instances and their respective labels to augment the robustness and calibration of deep neural networks. Despite its widespread adoption, the nuanced mechanisms that underpin its success are not entirely understood. The observed phenomenon of Neural Collapse, where the last-layer activations and classifier of deep networks converge to a simplex equiangular tight frame (ETF), provides a compelling motivation to explore whether mixup induces alternative geometric configurations and whether those could explain its success. In this study, we delve into the last-layer activations of training data for deep networks subjected to mixup, aiming to uncover insights into its operational efficacy. Our investigation, spanning various architectures and dataset pairs, reveals that mixup's last-layer activations predominantly converge to a distinctive configuration different than one might expect. In this configuration, activations from mixed-up examples of identical classes align with the classifier, while those from different classes delineate channels along the decision boundary. Moreover, activations in earlier layers exhibit patterns, as if trained with manifold mixup. These findings are unexpected, as mixed-up features are not simple convex combinations of feature class means (as one might get, for example, by training mixup with the mean squared error loss). By analyzing this distinctive geometric configuration, we elucidate the mechanisms by which mixup enhances model calibration. To further validate our empirical observations, we conduct a theoretical analysis under the assumption of an unconstrained features model, utilizing the mixup loss. Through this, we characterize and derive the optimal last-layer features under the assumption that the classifier forms a simplex ETF.

Related papers

Gaussian mixture layers for neural networks [14.707634995360591]
Mean-field theory for two-layer neural networks considers infinitely wide networks that are linearly parameterized by a probability measure over the parameter space.<n>This nonparametric perspective has significantly advanced both the theoretical and conceptual understanding of neural networks.<n>In this work, we investigate whether dynamics can be directly implemented over probability measures.
arXiv Detail & Related papers (2025-08-06T21:16:17Z)
Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z)
ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization [0.32885740436059047]
Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets. We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
arXiv Detail & Related papers (2024-12-02T13:21:31Z)
Uncertainty estimation via ensembles of deep learning models and dropout layers for seismic traces [27.619194576741673]
We develop Convolutional Neural Networks (CNNs) to classify seismic waveforms based on first-motion polarity. We constructed ensembles of networks to estimate uncertainty. We observe that the uncertainty estimation ability of the ensembles of networks can be enhanced using dropout layers.
arXiv Detail & Related papers (2024-10-08T15:22:15Z)
Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z)
Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings. We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Ex uno plures: Splitting One Model into an Ensemble of Subnetworks [18.814965334083425]
We propose a strategy to compute an ensemble ofworks, each corresponding to a non-overlapping dropout mask computed via a pruning strategy and trained independently. We show that the proposed subnetwork ensembling method can perform as well as standard deep ensembles in both accuracy and uncertainty estimates. We experimentally demonstrate that subnetwork ensembling also consistently outperforms recently proposed approaches that efficiently ensemble neural networks.
arXiv Detail & Related papers (2021-06-09T01:49:49Z)
Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images. We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z)
Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair. A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
Analyzing Overfitting under Class Imbalance in Neural Networks for Image Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures. In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z)
DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces. We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.