Faithful and Efficient Explanations for Neural Networks via Neural
Tangent Kernel Surrogate Models
- URL: http://arxiv.org/abs/2305.14585v5
- Date: Mon, 11 Mar 2024 21:45:30 GMT
- Title: Faithful and Efficient Explanations for Neural Networks via Neural
Tangent Kernel Surrogate Models
- Authors: Andrew Engel, Zhichao Wang, Natalie S. Frank, Ioana Dumitriu, Sutanay
Choudhury, Anand Sarwate, Tony Chiang
- Abstract summary: We analyze approximate empirical neural tangent kernels (eNTK) for data attribution.
We introduce two new random projection variants of approximate eNTK which allow users to tune the time and memory complexity of their calculation.
We conclude that kernel machines using approximate neural tangent kernel as the kernel function are effective surrogate models.
- Score: 7.608408123113268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent trend in explainable AI research has focused on surrogate modeling,
where neural networks are approximated as simpler ML algorithms such as kernel
machines. A second trend has been to utilize kernel functions in various
explain-by-example or data attribution tasks. In this work, we combine these
two trends to analyze approximate empirical neural tangent kernels (eNTK) for
data attribution. Approximation is critical for eNTK analysis due to the high
computational cost to compute the eNTK. We define new approximate eNTK and
perform novel analysis on how well the resulting kernel machine surrogate
models correlate with the underlying neural network. We introduce two new
random projection variants of approximate eNTK which allow users to tune the
time and memory complexity of their calculation. We conclude that kernel
machines using approximate neural tangent kernel as the kernel function are
effective surrogate models, with the introduced trace NTK the most consistent
performer. Open source software allowing users to efficiently calculate kernel
functions in the PyTorch framework is available
(https://github.com/pnnl/projection\_ntk).
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - An Exact Kernel Equivalence for Finite Classification Models [1.4777718769290527]
We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK.
We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks.
arXiv Detail & Related papers (2023-08-01T20:22:53Z) - On the Eigenvalue Decay Rates of a Class of Neural-Network Related
Kernel Functions Defined on General Domains [10.360517127652185]
We provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain.
This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions.
arXiv Detail & Related papers (2023-05-04T08:54:40Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Classifying high-dimensional Gaussian mixtures: Where kernel methods
fail and neural networks succeed [27.38015169185521]
We show theoretically that two-layer neural networks (2LNN) with only a few hidden neurons can beat the performance of kernel learning.
We show how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.
arXiv Detail & Related papers (2021-02-23T15:10:15Z) - Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite
Networks [12.692279981822011]
We derive the covariance functions of multi-layer perceptrons with exponential linear units (ELU) and Gaussian error linear units (GELU)
We analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions.
We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics.
arXiv Detail & Related papers (2020-02-20T01:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.