Disentangling deep neural networks with rectified linear units using
duality
- URL: http://arxiv.org/abs/2110.03403v1
- Date: Wed, 6 Oct 2021 16:51:59 GMT
- Title: Disentangling deep neural networks with rectified linear units using
duality
- Authors: Chandrashekar Lakshminarayanan and Amit Vikram Singh
- Abstract summary: We propose a novel interpretable counterpart of deep neural networks (DNNs) with rectified linear units (ReLUs)
We show that convolution with global pooling and skip connection provide respectively rotational invariance and ensemble structure to the neural path kernel (NPK)
- Score: 4.683806391173103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their success deep neural networks (DNNs) are still largely
considered as black boxes. The main issue is that the linear and non-linear
operations are entangled in every layer, making it hard to interpret the hidden
layer outputs. In this paper, we look at DNNs with rectified linear units
(ReLUs), and focus on the gating property (`on/off' states) of the ReLUs. We
extend the recently developed dual view in which the computation is broken
path-wise to show that learning in the gates is more crucial, and learning the
weights given the gates is characterised analytically via the so called neural
path kernel (NPK) which depends on inputs and gates. In this paper, we present
novel results to show that convolution with global pooling and skip connection
provide respectively rotational invariance and ensemble structure to the NPK.
To address `black box'-ness, we propose a novel interpretable counterpart of
DNNs with ReLUs namely deep linearly gated networks (DLGN): the pre-activations
to the gates are generated by a deep linear network, and the gates are then
applied as external masks to learn the weights in a different network. The DLGN
is not an alternative architecture per se, but a disentanglement and an
interpretable re-arrangement of the computations in a DNN with ReLUs. The DLGN
disentangles the computations into two `mathematically' interpretable
linearities (i) the `primal' linearity between the input and the
pre-activations in the gating network and (ii) the `dual' linearity in the path
space in the weights network characterised by the NPK. We compare the
performance of DNN, DGN and DLGN on CIFAR-10 and CIFAR-100 to show that, the
DLGN recovers more than $83.5\%$ of the performance of state-of-the-art DNNs.
This brings us to an interesting question: `Is DLGN a universal spectral
approximator?'
Related papers
- Forward Learning of Graph Neural Networks [17.79590285482424]
Backpropagation (BP) is the de facto standard for training deep neural networks (NNs)
BP imposes several constraints, which are not only biologically implausible, but also limit the scalability, parallelism, and flexibility in learning NNs.
We propose ForwardGNN, which avoids the constraints imposed by BP via an effective layer-wise local forward training.
arXiv Detail & Related papers (2024-03-16T19:40:35Z) - NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks.
We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT.
We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z) - Deep Networks Always Grok and Here is Why [15.327649172531606]
Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error.
We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings.
arXiv Detail & Related papers (2024-02-23T18:59:31Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Explicitising The Implicit Intrepretability of Deep Neural Networks Via
Duality [5.672223170618133]
Recent work by Lakshminarayanan and Singh provided a dual view for fully connected deep neural networks (DNNs) with rectified linear units (ReLU)
arXiv Detail & Related papers (2022-03-01T03:08:21Z) - Wide and Deep Graph Neural Network with Distributed Online Learning [174.8221510182559]
Graph neural networks (GNNs) are naturally distributed architectures for learning representations from network data.
Online learning can be leveraged to retrain GNNs at testing time to overcome this issue.
This paper develops the Wide and Deep GNN (WD-GNN), a novel architecture that can be updated with distributed online learning mechanisms.
arXiv Detail & Related papers (2021-07-19T23:56:48Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - Nonlinear State-Space Generalizations of Graph Convolutional Neural
Networks [172.18295279061607]
Graph convolutional neural networks (GCNNs) learn compositional representations from network data by nesting linear graph convolutions into nonlinearities.
In this work, we approach GCNNs from a state-space perspective revealing that the graph convolutional module is a minimalistic linear state-space model.
We show that this state update may be problematic because it is nonparametric, and depending on the graph spectrum it may explode or vanish.
We propose a novel family of nodal aggregation rules that aggregate node features within a layer in a nonlinear state-space parametric fashion allowing for a better trade-off.
arXiv Detail & Related papers (2020-10-27T19:48:56Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - Deep Gated Networks: A framework to understand training and
generalisation in deep learning [3.6954802719347426]
We make use of deep gated networks (DGNs) as a framework to obtain insights about DNNs with ReLU activation.
Our theory throws light on two questions namely why increasing depth till a point helps in training and why increasing depth beyond a point hurts training.
arXiv Detail & Related papers (2020-02-10T18:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.