NFT-K: Non-Fungible Tangent Kernels
- URL: http://arxiv.org/abs/2110.04945v1
- Date: Mon, 11 Oct 2021 00:35:47 GMT
- Title: NFT-K: Non-Fungible Tangent Kernels
- Authors: Sina Alemohammad, Hossein Babaei, CJ Barberan, Naiming Liu, Lorenzo
Luzi, Blake Mason, Richard G. Baraniuk
- Abstract summary: We develop a new network as a combination of multiple neural tangent kernels, one to model each layer of the deep neural network individually.
We demonstrate the interpretability of this model on two datasets, showing that the multiple kernels model elucidates the interplay between the layers and predictions.
- Score: 23.93508901712177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have become essential for numerous applications due to
their strong empirical performance such as vision, RL, and classification.
Unfortunately, these networks are quite difficult to interpret, and this limits
their applicability in settings where interpretability is important for safety,
such as medical imaging. One type of deep neural network is neural tangent
kernel that is similar to a kernel machine that provides some aspect of
interpretability. To further contribute interpretability with respect to
classification and the layers, we develop a new network as a combination of
multiple neural tangent kernels, one to model each layer of the deep neural
network individually as opposed to past work which attempts to represent the
entire network via a single neural tangent kernel. We demonstrate the
interpretability of this model on two datasets, showing that the multiple
kernels model elucidates the interplay between the layers and predictions.
Related papers
- Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks [13.983863226803336]
We argue that "Feature Averaging" is one of the principal factors contributing to non-robustness of deep neural networks.
We provide a detailed theoretical analysis of the training dynamics of gradient descent in a two-layer ReLU network for a binary classification task.
We prove that, with the provision of more granular supervised information, a two-layer multi-class neural network is capable of learning individual features.
arXiv Detail & Related papers (2024-10-14T09:28:32Z) - Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Neural Network Pruning as Spectrum Preserving Process [7.386663473785839]
We identify the close connection between matrix spectrum learning and neural network training for dense and convolutional layers.
We propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning result.
arXiv Detail & Related papers (2023-07-18T05:39:32Z) - Why Quantization Improves Generalization: NTK of Binary Weight Neural
Networks [33.08636537654596]
We take the binary weights in a neural network as random variables under rounding, and study the distribution propagation over different layers in the neural network.
We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function.
arXiv Detail & Related papers (2022-06-13T06:11:21Z) - Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student
Settings and its Superiority to Kernel Methods [58.44819696433327]
We investigate the risk of two-layer ReLU neural networks in a teacher regression model.
We find that the student network provably outperforms any solution methods.
arXiv Detail & Related papers (2022-05-30T02:51:36Z) - Interpretable part-whole hierarchies and conceptual-semantic
relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues.
We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z) - Provably Training Neural Network Classifiers under Fairness Constraints [70.64045590577318]
We show that overparametrized neural networks could meet the constraints.
Key ingredient of building a fair neural network classifier is establishing no-regret analysis for neural networks.
arXiv Detail & Related papers (2020-12-30T18:46:50Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - On the Empirical Neural Tangent Kernel of Standard Finite-Width
Convolutional Neural Network Architectures [3.4698840925433765]
It remains an open question how well NTK theory models standard neural network architectures of widths common in practice.
We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet.
For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.
arXiv Detail & Related papers (2020-06-24T11:40:36Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.