Exploring the Common Principal Subspace of Deep Features in Neural
Networks
- URL: http://arxiv.org/abs/2110.02863v1
- Date: Wed, 6 Oct 2021 15:48:32 GMT
- Title: Exploring the Common Principal Subspace of Deep Features in Neural
Networks
- Authors: Haoran Liu, Haoyi Xiong, Yaqing Wang, Haozhe An, Dongrui Wu, and
Dejing Dou
- Abstract summary: We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces.
Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN.
Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
- Score: 50.37178960258464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We find that different Deep Neural Networks (DNNs) trained with the same
dataset share a common principal subspace in latent spaces, no matter in which
architectures (e.g., Convolutional Neural Networks (CNNs), Multi-Layer
Preceptors (MLPs) and Autoencoders (AEs)) the DNNs were built or even whether
labels have been used in training (e.g., supervised, unsupervised, and
self-supervised learning). Specifically, we design a new metric
$\mathcal{P}$-vector to represent the principal subspace of deep features
learned in a DNN, and propose to measure angles between the principal subspaces
using $\mathcal{P}$-vectors. Small angles (with cosine close to $1.0$) have
been found in the comparisons between any two DNNs trained with different
algorithms/architectures. Furthermore, during the training procedure from
random scratch, the angle decrease from a larger one ($70^\circ-80^\circ$
usually) to the small one, which coincides the progress of feature space
learning from scratch to convergence. Then, we carry out case studies to
measure the angle between the $\mathcal{P}$-vector and the principal subspace
of training dataset, and connect such angle with generalization performance.
Extensive experiments with practically-used Multi-Layer Perceptron (MLPs), AEs
and CNNs for classification, image reconstruction, and self-supervised learning
tasks on MNIST, CIFAR-10 and CIFAR-100 datasets have been done to support our
claims with solid evidences.
Interpretability of Deep Learning, Feature Learning, and Subspaces of Deep
Features
Related papers
- Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map [4.776836972093627]
We present a method for analysing feature learning by decomposing deep neural networks (DNNs)
We find that DNNs converge to a minimal feature (MF) regime dominated by a number of eigenfunctions equal to the number of classes.
We recast the phenomenon of neural collapse into a kernel picture which can be extended to broader tasks such as regression.
arXiv Detail & Related papers (2024-10-05T18:53:48Z) - Half-Space Feature Learning in Neural Networks [2.3249139042158853]
There currently exist two extreme viewpoints for neural network feature learning.
We argue neither interpretation is likely to be correct based on a novel viewpoint.
We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN)
arXiv Detail & Related papers (2024-04-05T12:03:19Z) - Trainable Weight Averaging: A General Approach for Subspace Training [20.58652836107849]
Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better performance.
We propose emphTrainable Weight Averaging (TWA), a general approach for subspace training.
TWA is efficient in terms of subspace extraction and easy to generalization.
arXiv Detail & Related papers (2022-05-26T01:54:48Z) - A singular Riemannian geometry approach to Deep Neural Networks II.
Reconstruction of 1-D equivalence classes [78.120734120667]
We build the preimage of a point in the output manifold in the input space.
We focus for simplicity on the case of neural networks maps from n-dimensional real spaces to (n - 1)-dimensional real spaces.
arXiv Detail & Related papers (2021-12-17T11:47:45Z) - Embedded Knowledge Distillation in Depth-level Dynamic Neural Network [8.207403859762044]
We propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures.
In this article, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets.
Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depth-level pruning or individually training.
arXiv Detail & Related papers (2021-03-01T06:35:31Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Neural Contextual Bandits with Deep Representation and Shallow
Exploration [105.8099566651448]
We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network.
Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.
arXiv Detail & Related papers (2020-12-03T09:17:55Z) - Multi-Subspace Neural Network for Image Recognition [33.61205842747625]
In image classification task, feature extraction is always a big issue. Intra-class variability increases the difficulty in designing the extractors.
Recently, deep learning has drawn lots of attention on automatically learning features from data.
In this study, we proposed multi-subspace neural network (MSNN) which integrates key components of the convolutional neural network (CNN), receptive field, with subspace concept.
arXiv Detail & Related papers (2020-06-17T02:55:34Z) - Architecture Disentanglement for Deep Neural Networks [174.16176919145377]
We introduce neural architecture disentanglement (NAD) to explain the inner workings of deep neural networks (DNNs)
NAD learns to disentangle a pre-trained DNN into sub-architectures according to independent tasks, forming information flows that describe the inference processes.
Results show that misclassified images have a high probability of being assigned to task sub-architectures similar to the correct ones.
arXiv Detail & Related papers (2020-03-30T08:34:33Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.