Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination
- URL: http://arxiv.org/abs/2311.02960v2
- Date: Tue, 9 Jan 2024 16:16:34 GMT
- Title: Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination
- Authors: Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, and
Qing Qu
- Abstract summary: We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
- Score: 33.273226655730326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past decade, deep learning has proven to be a highly effective tool
for learning meaningful features from raw data. However, it remains an open
question how deep networks perform hierarchical feature learning across layers.
In this work, we attempt to unveil this mystery by investigating the structures
of intermediate features. Motivated by our empirical findings that linear
layers mimic the roles of deep layers in nonlinear networks for feature
learning, we explore how deep linear networks transform input data into output
by investigating the output (i.e., features) of each layer after training in
the context of multi-class classification problems. Toward this goal, we first
define metrics to measure within-class compression and between-class
discrimination of intermediate features, respectively. Through theoretical
analysis of these two metrics, we show that the evolution of features follows a
simple and quantitative pattern from shallow to deep layers when the input data
is nearly orthogonal and the network weights are minimum-norm, balanced, and
approximate low-rank: Each layer of the linear network progressively compresses
within-class features at a geometric rate and discriminates between-class
features at a linear rate with respect to the number of layers that data have
passed through. To the best of our knowledge, this is the first quantitative
characterization of feature evolution in hierarchical representations of deep
linear networks. Empirically, our extensive experiments not only validate our
theoretical results numerically but also reveal a similar pattern in deep
nonlinear networks which aligns well with recent empirical studies. Moreover,
we demonstrate the practical implications of our results in transfer learning.
Our code is available at \url{https://github.com/Heimine/PNC_DLN}.
Related papers
- Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks [13.983863226803336]
We argue that "Feature Averaging" is one of the principal factors contributing to non-robustness of deep neural networks.
We provide a detailed theoretical analysis of the training dynamics of gradient descent in a two-layer ReLU network for a binary classification task.
We prove that, with the provision of more granular supervised information, a two-layer multi-class neural network is capable of learning individual features.
arXiv Detail & Related papers (2024-10-14T09:28:32Z) - Neural Collapse in the Intermediate Hidden Layers of Classification
Neural Networks [0.0]
(NC) gives a precise description of the representations of classes in the final hidden layer of classification neural networks.
In the present paper, we provide the first comprehensive empirical analysis of the emergence of (NC) in the intermediate hidden layers.
arXiv Detail & Related papers (2023-08-05T01:19:38Z) - Understanding Deep Neural Networks via Linear Separability of Hidden
Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z) - Hidden Classification Layers: Enhancing linear separability between
classes in neural networks layers [0.0]
We investigate the impact on deep network performances of a training approach.
We propose a neural network architecture which induces an error function involving the outputs of all the network layers.
arXiv Detail & Related papers (2023-06-09T10:52:49Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - A Law of Data Separation in Deep Learning [41.58856318262069]
We study the fundamental question of how deep neural networks process data in the intermediate layers.
Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership.
arXiv Detail & Related papers (2022-10-31T02:25:38Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.