Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks
- URL: http://arxiv.org/abs/2307.15679v1
- Date: Fri, 28 Jul 2023 17:14:58 GMT
- Title: Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks
- Authors: Ran Dou and Jose Principe
- Abstract summary: We study the dynamics of the hidden state in recurrent neural networks.
We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix.
We provide an explanation for long-term dependency based on the eigen analysis.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recurrent neural networks, learning long-term dependency is the main
difficulty due to the vanishing and exploding gradient problem. Many
researchers are dedicated to solving this issue and they proposed many
algorithms. Although these algorithms have achieved great success,
understanding how the information decays remains an open problem. In this
paper, we study the dynamics of the hidden state in recurrent neural networks.
We propose a new perspective to analyze the hidden state space based on an
eigen decomposition of the weight matrix. We start the analysis by linear state
space model and explain the function of preserving information in activation
functions. We provide an explanation for long-term dependency based on the
eigen analysis. We also point out the different behavior of eigenvalues for
regression tasks and classification tasks. From the observations on
well-trained recurrent neural networks, we proposed a new initialization method
for recurrent neural networks, which improves consistently performance. It can
be applied to vanilla-RNN, LSTM, and GRU. We test on many datasets, such as
Tomita Grammars, pixel-by-pixel MNIST datasets, and machine translation
datasets (Multi30k). It outperforms the Xavier initializer and kaiming
initializer as well as other RNN-only initializers like IRNN and sp-RNN in
several tasks.
Related papers
- RelChaNet: Neural Network Feature Selection using Relative Change Scores [0.0]
We introduce RelChaNet, a novel and lightweight feature selection algorithm that uses neuron pruning and regrowth in the input layer of a dense neural network.
Our approach generally outperforms the current state-of-the-art methods, and in particular improves the average accuracy by 2% on the MNIST dataset.
arXiv Detail & Related papers (2024-10-03T09:56:39Z) - Investigating Sparsity in Recurrent Neural Networks [0.0]
This thesis focuses on investigating the effects of pruning and Sparse Recurrent Neural Networks on the performance of RNNs.
We first describe the pruning of RNNs, its impact on the performance of RNNs, and the number of training epochs required to regain accuracy after the pruning is performed.
Next, we continue with the creation and training of Sparse Recurrent Neural Networks and identify the relation between the performance and the graph properties of its underlying arbitrary structure.
arXiv Detail & Related papers (2024-07-30T07:24:58Z) - NeuroView-RNN: It's About Time [25.668977252138905]
A key interpretability issue with RNNs is that it is not clear how each hidden state per time step contributes to the decision-making process.
We propose NeuroView-RNN as a family of new RNN architectures that explains how all the time steps are used for the decision-making process.
We showcase the benefits of NeuroView-RNN by evaluating on a multitude of diverse time-series datasets.
arXiv Detail & Related papers (2022-02-23T22:29:11Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings.
DNNs are often treated as black box systems, which complicates their evaluation and validation.
One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations.
arXiv Detail & Related papers (2020-06-30T14:56:05Z) - Emotion Recognition on large video dataset based on Convolutional
Feature Extractor and Recurrent Neural Network [0.2855485723554975]
Our model combines convolutional neural network (CNN) with recurrent neural network (RNN) to predict dimensional emotions on video data.
Experiments are performed on publicly available datasets including the largest modern Aff-Wild2 database.
arXiv Detail & Related papers (2020-06-19T14:54:13Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.