Neural Path Features and Neural Path Kernel : Understanding the role of
gates in deep learning
- URL: http://arxiv.org/abs/2006.10529v2
- Date: Sat, 12 Jun 2021 17:42:53 GMT
- Title: Neural Path Features and Neural Path Kernel : Understanding the role of
gates in deep learning
- Authors: Chandrashekar Lakshminarayanan and Amit Vikram Singh
- Abstract summary: This paper analytically characterises the role of active sub-networks in deep learning.
We encode the on/off state of the gates of a given input in a novel 'neural path feature' (NPF)
We show that the output of network is indeed the inner product of NPF and NPV.
- Score: 3.6954802719347426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rectified linear unit (ReLU) activations can also be thought of as 'gates',
which, either pass or stop their pre-activation input when they are 'on' (when
the pre-activation input is positive) or 'off' (when the pre-activation input
is negative) respectively. A deep neural network (DNN) with ReLU activations
has many gates, and the on/off status of each gate changes across input
examples as well as network weights. For a given input example, only a subset
of gates are 'active', i.e., on, and the sub-network of weights connected to
these active gates is responsible for producing the output. At randomised
initialisation, the active sub-network corresponding to a given input example
is random. During training, as the weights are learnt, the active sub-networks
are also learnt, and potentially hold very valuable information. In this paper,
we analytically characterise the role of active sub-networks in deep learning.
To this end, we encode the on/off state of the gates of a given input in a
novel 'neural path feature' (NPF), and the weights of the DNN are encoded in a
novel 'neural path value' (NPV). Further, we show that the output of network is
indeed the inner product of NPF and NPV. The main result of the paper shows
that the 'neural path kernel' associated with the NPF is a fundamental quantity
that characterises the information stored in the gates of a DNN. We show via
experiments (on MNIST and CIFAR-10) that in standard DNNs with ReLU activations
NPFs are learnt during training and such learning is key for generalisation.
Furthermore, NPFs and NPVs can be learnt in two separate networks and such
learning also generalises well in experiments.
Related papers
- Network Inversion of Binarised Neural Nets [3.5571131514746837]
Network inversion plays a pivotal role in unraveling the black-box nature of input to output mappings in neural networks.
This paper introduces a novel approach to invert a trained BNN by encoding it into a CNF formula that captures the network's structure.
arXiv Detail & Related papers (2024-02-19T09:39:54Z) - Properties and Potential Applications of Random Functional-Linked Types
of Neural Networks [81.56822938033119]
Random functional-linked neural networks (RFLNNs) offer an alternative way of learning in deep structure.
This paper gives some insights into the properties of RFLNNs from the viewpoints of frequency domain.
We propose a method to generate a BLS network with better performance, and design an efficient algorithm for solving Poison's equation.
arXiv Detail & Related papers (2023-04-03T13:25:22Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - How and what to learn:The modes of machine learning [7.085027463060304]
We propose a new approach, namely the weight pathway analysis (WPA), to study the mechanism of multilayer neural networks.
WPA shows that a neural network stores and utilizes information in a "holographic" way, that is, the network encodes all training samples in a coherent structure.
It is found that hidden-layer neurons self-organize into different classes in the later stages of the learning process.
arXiv Detail & Related papers (2022-02-28T14:39:06Z) - Disentangling deep neural networks with rectified linear units using
duality [4.683806391173103]
We propose a novel interpretable counterpart of deep neural networks (DNNs) with rectified linear units (ReLUs)
We show that convolution with global pooling and skip connection provide respectively rotational invariance and ensemble structure to the neural path kernel (NPK)
arXiv Detail & Related papers (2021-10-06T16:51:59Z) - Credit Assignment Through Broadcasting a Global Error Vector [4.683806391173103]
Backpropagation (BP) uses detailed, unit-specific feedback to train deep neural networks (DNNs) with remarkable success.
Here, we explore the extent to which a globally broadcast learning signal, coupled with local weight updates, enables training of DNNs.
arXiv Detail & Related papers (2021-06-08T04:08:46Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Forgetting Outside the Box: Scrubbing Deep Networks of Information
Accessible from Input-Output Observations [143.3053365553897]
We describe a procedure for removing dependency on a cohort of training data from a trained deep network.
We introduce a new bound on how much information can be extracted per query about the forgotten cohort.
We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.
arXiv Detail & Related papers (2020-03-05T23:17:35Z) - Refined Gate: A Simple and Effective Gating Mechanism for Recurrent
Units [68.30422112784355]
We propose a new gating mechanism within general gated recurrent neural networks to handle this issue.
The proposed gates directly short connect the extracted input features to the outputs of vanilla gates.
We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
arXiv Detail & Related papers (2020-02-26T07:51:38Z) - Deep Gated Networks: A framework to understand training and
generalisation in deep learning [3.6954802719347426]
We make use of deep gated networks (DGNs) as a framework to obtain insights about DNNs with ReLU activation.
Our theory throws light on two questions namely why increasing depth till a point helps in training and why increasing depth beyond a point hurts training.
arXiv Detail & Related papers (2020-02-10T18:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.