Least Redundant Gated Recurrent Neural Network
- URL: http://arxiv.org/abs/2105.14092v6
- Date: Mon, 17 Apr 2023 13:29:37 GMT
- Title: Least Redundant Gated Recurrent Neural Network
- Authors: {\L}ukasz Neumann, {\L}ukasz Lepak, Pawe{\l} Wawrzy\'nski
- Abstract summary: We introduce a recurrent neural architecture called Deep Memory Update (DMU)
It is based on updating the previous memory state with a deep transformation of the lagged state and the network input.
Its training is stable and fast due to relating its learning rate to the size of the module.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks are important tools for sequential data processing.
However, they are notorious for problems regarding their training. Challenges
include capturing complex relations between consecutive states and stability
and efficiency of training. In this paper, we introduce a recurrent neural
architecture called Deep Memory Update (DMU). It is based on updating the
previous memory state with a deep transformation of the lagged state and the
network input. The architecture is able to learn to transform its internal
state using any nonlinear function. Its training is stable and fast due to
relating its learning rate to the size of the module. Even though DMU is based
on standard components, experimental results presented here confirm that it can
compete with and often outperform state-of-the-art architectures such as Long
Short-Term Memory, Gated Recurrent Units, and Recurrent Highway Networks.
Related papers
- Memory-Efficient Reversible Spiking Neural Networks [8.05761813203348]
Spiking neural networks (SNNs) are potential competitors to artificial neural networks (ANNs)
SNNs require much more memory than ANNs, which impedes the training of deeper SNN models.
We propose the reversible spiking neural network to reduce the memory cost of intermediate activations and membrane potentials during training.
arXiv Detail & Related papers (2023-12-13T06:39:49Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Deep Transformer Q-Networks for Partially Observable Reinforcement
Learning [14.126617899983097]
Deep Transformer Q-Networks (DTQN) is a novel architecture utilizing transformers and self-attention to encode an agent's history.
Our experiments demonstrate the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.
arXiv Detail & Related papers (2022-06-02T15:04:18Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Online Training of Spiking Recurrent Neural Networks with Phase-Change
Memory Synapses [1.9809266426888898]
Training spiking neural networks (RNNs) on dedicated neuromorphic hardware is still an open challenge.
We present a simulation framework of differential-architecture arrays based on an accurate and comprehensive Phase-Change Memory (PCM) device model.
We train a spiking RNN whose weights are emulated in the presented simulation framework, using a recently proposed e-prop learning rule.
arXiv Detail & Related papers (2021-08-04T01:24:17Z) - Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred.
This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.