Related papers: Disentangling deep neural networks with rectified linear units using duality

Disentangling deep neural networks with rectified linear units using duality

URL: http://arxiv.org/abs/2110.03403v1
Date: Wed, 6 Oct 2021 16:51:59 GMT
Title: Disentangling deep neural networks with rectified linear units using duality
Authors: Chandrashekar Lakshminarayanan and Amit Vikram Singh
Abstract summary: We propose a novel interpretable counterpart of deep neural networks (DNNs) with rectified linear units (ReLUs) We show that convolution with global pooling and skip connection provide respectively rotational invariance and ensemble structure to the neural path kernel (NPK)
Score: 4.683806391173103
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their success deep neural networks (DNNs) are still largely considered as black boxes. The main issue is that the linear and non-linear operations are entangled in every layer, making it hard to interpret the hidden layer outputs. In this paper, we look at DNNs with rectified linear units (ReLUs), and focus on the gating property (`on/off' states) of the ReLUs. We extend the recently developed dual view in which the computation is broken path-wise to show that learning in the gates is more crucial, and learning the weights given the gates is characterised analytically via the so called neural path kernel (NPK) which depends on inputs and gates. In this paper, we present novel results to show that convolution with global pooling and skip connection provide respectively rotational invariance and ensemble structure to the NPK. To address `black box'-ness, we propose a novel interpretable counterpart of DNNs with ReLUs namely deep linearly gated networks (DLGN): the pre-activations to the gates are generated by a deep linear network, and the gates are then applied as external masks to learn the weights in a different network. The DLGN is not an alternative architecture per se, but a disentanglement and an interpretable re-arrangement of the computations in a DNN with ReLUs. The DLGN disentangles the computations into two `mathematically' interpretable linearities (i) the `primal' linearity between the input and the pre-activations in the gating network and (ii) the `dual' linearity in the path space in the weights network characterised by the NPK. We compare the performance of DNN, DGN and DLGN on CIFAR-10 and CIFAR-100 to show that, the DLGN recovers more than $83.5\%$ of the performance of state-of-the-art DNNs. This brings us to an interesting question: `Is DLGN a universal spectral approximator?'

Related papers

NN-Former: Rethinking Graph Structure in Neural Architecture Representation [67.3378579108611]
Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures.<n>We show that sibling nodes are pivotal while overlooked in previous research.<n>Our approach consistently achieves promising performance in both accuracy and latency prediction.
arXiv Detail & Related papers (2025-07-01T15:46:18Z)
Forward Learning of Graph Neural Networks [17.79590285482424]
Backpropagation (BP) is the de facto standard for training deep neural networks (NNs) BP imposes several constraints, which are not only biologically implausible, but also limit the scalability, parallelism, and flexibility in learning NNs. We propose ForwardGNN, which avoids the constraints imposed by BP via an effective layer-wise local forward training.
arXiv Detail & Related papers (2024-03-16T19:40:35Z)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT. We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z)
Deep Networks Always Grok and Here is Why [15.327649172531606]
Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings.
arXiv Detail & Related papers (2024-02-23T18:59:31Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Explicitising The Implicit Intrepretability of Deep Neural Networks Via Duality [5.672223170618133]
Recent work by Lakshminarayanan and Singh provided a dual view for fully connected deep neural networks (DNNs) with rectified linear units (ReLU)
arXiv Detail & Related papers (2022-03-01T03:08:21Z)
Wide and Deep Graph Neural Network with Distributed Online Learning [174.8221510182559]
Graph neural networks (GNNs) are naturally distributed architectures for learning representations from network data. Online learning can be leveraged to retrain GNNs at testing time to overcome this issue. This paper develops the Wide and Deep GNN (WD-GNN), a novel architecture that can be updated with distributed online learning mechanisms.
arXiv Detail & Related papers (2021-07-19T23:56:48Z)
Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks. We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs) At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z)
Nonlinear State-Space Generalizations of Graph Convolutional Neural Networks [172.18295279061607]
Graph convolutional neural networks (GCNNs) learn compositional representations from network data by nesting linear graph convolutions into nonlinearities. In this work, we approach GCNNs from a state-space perspective revealing that the graph convolutional module is a minimalistic linear state-space model. We show that this state update may be problematic because it is nonparametric, and depending on the graph spectrum it may explode or vanish. We propose a novel family of nodal aggregation rules that aggregate node features within a layer in a nonlinear state-space parametric fashion allowing for a better trade-off.
arXiv Detail & Related papers (2020-10-27T19:48:56Z)
Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice. We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
Deep Gated Networks: A framework to understand training and generalisation in deep learning [3.6954802719347426]
We make use of deep gated networks (DGNs) as a framework to obtain insights about DNNs with ReLU activation. Our theory throws light on two questions namely why increasing depth till a point helps in training and why increasing depth beyond a point hurts training.
arXiv Detail & Related papers (2020-02-10T18:12:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.