Nearest Class-Center Simplification through Intermediate Layers
- URL: http://arxiv.org/abs/2201.08924v1
- Date: Fri, 21 Jan 2022 23:21:26 GMT
- Title: Nearest Class-Center Simplification through Intermediate Layers
- Authors: Ido Ben-Shaul, Shai Dekel
- Abstract summary: Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold.
We inquire into the phenomena coined Neural Collapse in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in theoretical Deep Learning have introduced geometric
properties that occur during training, past the Interpolation Threshold --
where the training error reaches zero. We inquire into the phenomena coined
Neural Collapse in the intermediate layers of the networks, and emphasize the
innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further
show that these processes occur both in vision and language model
architectures. Lastly, we propose a Stochastic Variability-Simplification Loss
(SVSL) that encourages better geometrical features in intermediate layers, and
improves both train metrics and generalization.
Related papers
- Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy [44.99833362998488]
We argue that the learning process in classification problems can be understood through the lens of label clustering.
Specifically, we observe that networks tend to distinguish higher-level (hypernym) categories in the early stages of training.
We introduce a novel framework to track the evolution of the feature manifold during training, revealing how the hierarchy of class relations emerges.
arXiv Detail & Related papers (2025-02-17T18:47:01Z) - Progressive Feedforward Collapse of ResNet Training [7.824226954174748]
We study the relationship of the last-layer features to the data and intermediate layers during training.
We derive a model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase.
This study extends NC to PFC to model the collapse phenomenon of intermediate layers and its dependence on the input data, shedding light on the theoretical understanding of ResNet in classification problems.
arXiv Detail & Related papers (2024-05-02T03:48:08Z) - Low-Rank Learning by Design: the Role of Network Architecture and
Activation Linearity in Gradient Rank Collapse [14.817633094318253]
We study how architectural choices and structure of the data effect gradient rank bounds in deep neural networks (DNNs)
Our theoretical analysis provides these bounds for training fully-connected, recurrent, and convolutional neural networks.
We also demonstrate, both theoretically and empirically, how design choices like activation function linearity, bottleneck layer introduction, convolutional stride, and sequence truncation influence these bounds.
arXiv Detail & Related papers (2024-02-09T19:28:02Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Neural Collapse in the Intermediate Hidden Layers of Classification
Neural Networks [0.0]
(NC) gives a precise description of the representations of classes in the final hidden layer of classification neural networks.
In the present paper, we provide the first comprehensive empirical analysis of the emergence of (NC) in the intermediate hidden layers.
arXiv Detail & Related papers (2023-08-05T01:19:38Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Understanding Imbalanced Semantic Segmentation Through Neural Collapse [81.89121711426951]
We show that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes.
We introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure.
Our method ranks 1st and sets a new record on the ScanNet200 test leaderboard.
arXiv Detail & Related papers (2023-01-03T13:51:51Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Defensive Tensorization [113.96183766922393]
We propose tensor defensiveization, an adversarial defence technique that leverages a latent high-order factorization of the network.
We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks.
We validate the versatility of our approach across domains and low-precision architectures by considering an audio task and binary networks.
arXiv Detail & Related papers (2021-10-26T17:00:16Z) - LaplaceNet: A Hybrid Energy-Neural Model for Deep Semi-Supervised
Classification [0.0]
Recent developments in deep semi-supervised classification have reached unprecedented performance.
We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity.
Our model outperforms state-of-the-art methods for deep semi-supervised classification, over several benchmark datasets.
arXiv Detail & Related papers (2021-06-08T17:09:28Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.