Memorization-Dilation: Modeling Neural Collapse Under Label Noise
- URL: http://arxiv.org/abs/2206.05530v3
- Date: Tue, 4 Apr 2023 12:52:44 GMT
- Title: Memorization-Dilation: Modeling Neural Collapse Under Label Noise
- Authors: Duc Anh Nguyen, Ron Levie, Julian Lienen, Gitta Kutyniok, Eyke
H\"ullermeier
- Abstract summary: During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation.
Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse.
Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.
- Score: 10.134749691813344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The notion of neural collapse refers to several emergent phenomena that have
been empirically observed across various canonical classification problems.
During the terminal phase of training a deep neural network, the feature
embedding of all examples of the same class tend to collapse to a single
representation, and the features of different classes tend to separate as much
as possible. Neural collapse is often studied through a simplified model,
called the unconstrained feature representation, in which the model is assumed
to have "infinite expressivity" and can map each data point to any arbitrary
representation. In this work, we propose a more realistic variant of the
unconstrained feature representation that takes the limited expressivity of the
network into account. Empirical evidence suggests that the memorization of
noisy data points leads to a degradation (dilation) of the neural collapse.
Using a model of the memorization-dilation (M-D) phenomenon, we show one
mechanism by which different losses lead to different performances of the
trained network on noisy data. Our proofs reveal why label smoothing, a
modification of cross-entropy empirically observed to produce a regularization
effect, leads to improved generalization in classification tasks.
Related papers
- Unleashing the power of Neural Collapse for Transferability Estimation [42.09673383041276]
Well-trained models exhibit the phenomenon of Neural Collapse.
We propose a novel method termed Fair Collapse (FaCe) for transferability estimation.
FaCe yields state-of-the-art performance on different tasks including image classification, semantic segmentation, and text classification.
arXiv Detail & Related papers (2023-10-09T14:30:10Z) - Generalized Neural Collapse for a Large Number of Classes [33.46269920297418]
We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks.
We provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint.
arXiv Detail & Related papers (2023-10-09T02:27:04Z) - Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms.
We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z) - Neural Dependencies Emerging from Learning Massive Categories [94.77992221690742]
This work presents two astonishing findings on neural networks learned for large-scale image classification.
1) Given a well-trained model, the logits predicted for some category can be directly obtained by linearly combining the predictions of a few other categories.
2) Neural dependencies exist not only within a single model, but even between two independently learned models.
arXiv Detail & Related papers (2022-11-21T09:42:15Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - Slope and generalization properties of neural networks [0.0]
We show that the distribution of the slope of a well-trained neural network classifier is generally independent of the width of the layers in a fully connected network.
The slope is of similar size throughout the relevant volume, and varies smoothly. It also behaves as predicted in rescaling examples.
We discuss possible applications of the slope concept, such as using it as a part of the loss function or stopping criterion during network training, or ranking data sets in terms of their complexity.
arXiv Detail & Related papers (2021-07-03T17:54:27Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks.
We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models.
We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.