Neural Collapse in the Intermediate Hidden Layers of Classification
Neural Networks
- URL: http://arxiv.org/abs/2308.02760v1
- Date: Sat, 5 Aug 2023 01:19:38 GMT
- Title: Neural Collapse in the Intermediate Hidden Layers of Classification
Neural Networks
- Authors: Liam Parker, Emre Onal, Anton Stengel, Jake Intrater
- Abstract summary: (NC) gives a precise description of the representations of classes in the final hidden layer of classification neural networks.
In the present paper, we provide the first comprehensive empirical analysis of the emergence of (NC) in the intermediate hidden layers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Collapse (NC) gives a precise description of the representations of
classes in the final hidden layer of classification neural networks. This
description provides insights into how these networks learn features and
generalize well when trained past zero training error. However, to date, (NC)
has only been studied in the final layer of these networks. In the present
paper, we provide the first comprehensive empirical analysis of the emergence
of (NC) in the intermediate hidden layers of these classifiers. We examine a
variety of network architectures, activations, and datasets, and demonstrate
that some degree of (NC) emerges in most of the intermediate hidden layers of
the network, where the degree of collapse in any given layer is typically
positively correlated with the depth of that layer in the neural network.
Moreover, we remark that: (1) almost all of the reduction in intra-class
variance in the samples occurs in the shallower layers of the networks, (2) the
angular separation between class means increases consistently with hidden layer
depth, and (3) simple datasets require only the shallower layers of the
networks to fully learn them, whereas more difficult ones require the entire
network. Ultimately, these results provide granular insights into the
structural propagation of features through classification neural networks.
Related papers
- Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Hidden Classification Layers: Enhancing linear separability between
classes in neural networks layers [0.0]
We investigate the impact on deep network performances of a training approach.
We propose a neural network architecture which induces an error function involving the outputs of all the network layers.
arXiv Detail & Related papers (2023-06-09T10:52:49Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Deep Residual Compensation Convolutional Network without Backpropagation [0.0]
We introduce a residual compensation convolutional network, which is the first PCANet-like network trained with hundreds of layers.
To correct the classification errors, we train each layer with new labels derived from the residual information of all its preceding layers.
Our experiments show that our deep network outperforms all existing PCANet-like networks and is competitive with several traditional gradient-based models.
arXiv Detail & Related papers (2023-01-27T11:45:09Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Exploiting Heterogeneity in Operational Neural Networks by Synaptic
Plasticity [87.32169414230822]
Recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs)
In this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the Synaptic Plasticity paradigm that poses the essential learning theory in biological neurons.
Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs.
arXiv Detail & Related papers (2020-08-21T19:03:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.