Learning Representation from Neural Fisher Kernel with Low-rank
Approximation
- URL: http://arxiv.org/abs/2202.01944v1
- Date: Fri, 4 Feb 2022 02:28:02 GMT
- Title: Learning Representation from Neural Fisher Kernel with Low-rank
Approximation
- Authors: Ruixiang Zhang, Shuangfei Zhai, Etai Littwin, Josh Susskind
- Abstract summary: We first define the Neural Fisher Kernel (NFK), which is the Fisher Kernel applied to neural networks.
We show that NFK can be computed for both supervised and unsupervised learning models.
We then propose an efficient algorithm that computes a low rank approximation of NFK, which scales to large datasets and networks.
- Score: 16.14794818755318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the representation of neural networks from the view
of kernels. We first define the Neural Fisher Kernel (NFK), which is the Fisher
Kernel applied to neural networks. We show that NFK can be computed for both
supervised and unsupervised learning models, which can serve as a unified tool
for representation extraction. Furthermore, we show that practical NFKs exhibit
low-rank structures. We then propose an efficient algorithm that computes a low
rank approximation of NFK, which scales to large datasets and networks. We show
that the low-rank approximation of NFKs derived from unsupervised generative
models and supervised learning models gives rise to high-quality compact
representations of data, achieving competitive results on a variety of machine
learning tasks.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Kernel vs. Kernel: Exploring How the Data Structure Affects Neural Collapse [9.975341265604577]
"Neural Collapse" is the decrease in the within class variability of the network's deepest features, dubbed as NC1.
We provide a kernel-based analysis that does not suffer from this limitation.
We show that the NTK does not represent more collapsed features than the NNGP for prototypical data models.
arXiv Detail & Related papers (2024-06-04T08:33:56Z) - Exploring Learned Representations of Neural Networks with Principal
Component Analysis [1.0923877073891446]
In certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification.
We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse.
arXiv Detail & Related papers (2023-09-27T00:18:25Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - Spiking neural networks with Hebbian plasticity for unsupervised
representation learning [0.0]
We introduce a novel spiking neural network model for learning distributed internal representations from data in an unsupervised procedure.
We employ an online correlation-based Hebbian-Bayesian learning and rewiring mechanism, shown previously to perform representation learning, into a spiking neural network.
We show performance close to the non-spiking BCPNN, and competitive with other Hebbian-based spiking networks when trained on MNIST and F-MNIST machine learning benchmarks.
arXiv Detail & Related papers (2023-05-05T22:34:54Z) - Neighborhood Convolutional Network: A New Paradigm of Graph Neural
Networks for Node Classification [12.062421384484812]
Graph Convolutional Network (GCN) decouples neighborhood aggregation and feature transformation in each convolutional layer.
In this paper, we propose a new paradigm of GCN, termed Neighborhood Convolutional Network (NCN)
In this way, the model could inherit the merit of decoupled GCN for aggregating neighborhood information, at the same time, develop much more powerful feature learning modules.
arXiv Detail & Related papers (2022-11-15T02:02:51Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - End-to-End Learning of Deep Kernel Acquisition Functions for Bayesian
Optimization [39.56814839510978]
We propose a meta-learning method for Bayesian optimization with neural network-based kernels.
Our model is trained by a reinforcement learning framework from multiple tasks.
In experiments using three text document datasets, we demonstrate that the proposed method achieves better BO performance than the existing methods.
arXiv Detail & Related papers (2021-11-01T00:42:31Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.