Explicitising The Implicit Intrepretability of Deep Neural Networks Via
Duality
- URL: http://arxiv.org/abs/2203.16455v1
- Date: Tue, 1 Mar 2022 03:08:21 GMT
- Title: Explicitising The Implicit Intrepretability of Deep Neural Networks Via
Duality
- Authors: Chandrashekar Lakshminarayanan, Amit Vikram Singh, Arun Rajkumar
- Abstract summary: Recent work by Lakshminarayanan and Singh provided a dual view for fully connected deep neural networks (DNNs) with rectified linear units (ReLU)
- Score: 5.672223170618133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work by Lakshminarayanan and Singh [2020] provided a dual view for
fully connected deep neural networks (DNNs) with rectified linear units (ReLU).
It was shown that (i) the information in the gates is analytically
characterised by a kernel called the neural path kernel (NPK) and (ii) most
critical information is learnt in the gates, in that, given the learnt gates,
the weights can be retrained from scratch without significant loss in
performance. Using the dual view, in this paper, we rethink the conventional
interpretations of DNNs thereby explicitsing the implicit interpretability of
DNNs. Towards this, we first show new theoretical properties namely rotational
invariance and ensemble structure of the NPK in the presence of convolutional
layers and skip connections respectively. Our theory leads to two surprising
empirical results that challenge conventional wisdom: (i) the weights can be
trained even with a constant 1 input, (ii) the gating masks can be shuffled,
without any significant loss in performance. These results motivate a novel
class of networks which we call deep linearly gated networks (DLGNs). DLGNs
using the phenomenon of dual lifting pave way to more direct and simpler
interpretation of DNNs as opposed to conventional interpretations. We show via
extensive experiments on CIFAR-10 and CIFAR-100 that these DLGNs lead to much
better interpretability-accuracy tradeoff.
Related papers
- Infinite Width Limits of Self Supervised Neural Networks [6.178817969919849]
We bridge the gap between the NTK and self-supervised learning, focusing on two-layer neural networks trained under the Barlow Twins loss.
We prove that the NTK of Barlow Twins indeed becomes constant as the width of the network approaches infinity.
arXiv Detail & Related papers (2024-11-17T21:13:57Z) - GINN-KAN: Interpretability pipelining with applications in Physics Informed Neural Networks [5.2969467015867915]
We introduce the concept of interpretability pipelineing, to incorporate multiple interpretability techniques to outperform each individual technique.
We evaluate two recent models selected for their potential to incorporate interpretability into standard neural network architectures.
We introduce a novel interpretable neural network GINN-KAN that synthesizes the advantages of both models.
arXiv Detail & Related papers (2024-08-27T04:57:53Z) - Information-Theoretic Generalization Bounds for Deep Neural Networks [22.87479366196215]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.
This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Disentangling deep neural networks with rectified linear units using
duality [4.683806391173103]
We propose a novel interpretable counterpart of deep neural networks (DNNs) with rectified linear units (ReLUs)
We show that convolution with global pooling and skip connection provide respectively rotational invariance and ensemble structure to the neural path kernel (NPK)
arXiv Detail & Related papers (2021-10-06T16:51:59Z) - A self consistent theory of Gaussian Processes captures feature learning
effects in finite CNNs [2.28438857884398]
Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently.
Despite their theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning.
Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects.
arXiv Detail & Related papers (2021-06-08T05:20:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.