Related papers: Exploring Learned Representations of Neural Networks with Principal Component Analysis

Exploring Learned Representations of Neural Networks with Principal Component Analysis

URL: http://arxiv.org/abs/2309.15328v1
Date: Wed, 27 Sep 2023 00:18:25 GMT
Title: Exploring Learned Representations of Neural Networks with Principal Component Analysis
Authors: Amit Harlev, Andrew Engel, Panos Stinis, Tony Chiang
Abstract summary: In certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse.
Score: 1.0923877073891446
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN.

Related papers

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks [13.983863226803336]
We argue that "Feature Averaging" is one of the principal factors contributing to non-robustness of deep neural networks. We provide a detailed theoretical analysis of the training dynamics of gradient descent in a two-layer ReLU network for a binary classification task. We prove that, with the provision of more granular supervised information, a two-layer multi-class neural network is capable of learning individual features.
arXiv Detail & Related papers (2024-10-14T09:28:32Z)
Informed deep hierarchical classification: a non-standard analysis inspired approach [0.0]
It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer. The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields. To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks.
arXiv Detail & Related papers (2024-09-25T14:12:50Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z)
Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network. We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z)
Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set. We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps. Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z)
Structure and Performance of Fully Connected Neural Networks: Emerging Complex Network Properties [0.8484871864277639]
Complex Network (CN) techniques are proposed to analyze the structure and performance of fully connected neural networks. We build a dataset with 4 thousand models and their respective CN properties. Our findings suggest that CN properties play a critical role in the performance of fully connected neural networks.
arXiv Detail & Related papers (2021-07-29T14:53:52Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.