Related papers: Least Redundant Gated Recurrent Neural Network

Least Redundant Gated Recurrent Neural Network

URL: http://arxiv.org/abs/2105.14092v6
Date: Mon, 17 Apr 2023 13:29:37 GMT
Title: Least Redundant Gated Recurrent Neural Network
Authors: {\L}ukasz Neumann, {\L}ukasz Lepak, Pawe{\l} Wawrzy\'nski
Abstract summary: We introduce a recurrent neural architecture called Deep Memory Update (DMU) It is based on updating the previous memory state with a deep transformation of the lagged state and the network input. Its training is stable and fast due to relating its learning rate to the size of the module.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural networks are important tools for sequential data processing. However, they are notorious for problems regarding their training. Challenges include capturing complex relations between consecutive states and stability and efficiency of training. In this paper, we introduce a recurrent neural architecture called Deep Memory Update (DMU). It is based on updating the previous memory state with a deep transformation of the lagged state and the network input. The architecture is able to learn to transform its internal state using any nonlinear function. Its training is stable and fast due to relating its learning rate to the size of the module. Even though DMU is based on standard components, experimental results presented here confirm that it can compete with and often outperform state-of-the-art architectures such as Long Short-Term Memory, Gated Recurrent Units, and Recurrent Highway Networks.

Related papers

Parallelizable memory recurrent units [1.3159512679346688]
We introduce a new family of RNNs, the memory recurrent units (MRUs), that combine the persistent memory capabilities of nonlinear RNNs with the parallelizable computations of SSMs.<n>We show that BMRU achieves good results in tasks with long-term dependencies, and can be combined with state-space models to create hybrid networks that are parallelizable and have transient dynamics as well as persistent memory.
arXiv Detail & Related papers (2026-01-14T14:01:11Z)
TTT3R: 3D Reconstruction as Test-Time Training [69.51086319339662]
We revisit the 3D reconstruction foundation models from a Test-Time Training perspective.<n>We leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate.<n>This training-free intervention, termed TTT3R, substantially improves length generalization.
arXiv Detail & Related papers (2025-09-30T17:59:51Z)
StateX: Enhancing RNN Recall via Post-training State Expansion [48.96665606047916]
We introduce StateX, a training pipeline for efficiently expanding the states of pre-trained RNNs through post-training.<n>Experiments on models up to 1.3B parameters demonstrate that StateX efficiently enhances the recall and in-context learning ability of RNNs without incurring high post-training costs or compromising other capabilities.
arXiv Detail & Related papers (2025-09-26T17:55:22Z)
NeRF-based CBCT Reconstruction needs Normalization and Initialization [53.58395475423445]
NeRF-based methods suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network.<n>We introduce a Normalized Hash, which enhances feature consistency and mitigates the mismatch.<n>The neural network exhibits improved stability during early training, enabling faster convergence and enhanced reconstruction performance.
arXiv Detail & Related papers (2025-06-24T16:01:45Z)
Memory-Efficient Reversible Spiking Neural Networks [8.05761813203348]
Spiking neural networks (SNNs) are potential competitors to artificial neural networks (ANNs) SNNs require much more memory than ANNs, which impedes the training of deeper SNN models. We propose the reversible spiking neural network to reduce the memory cost of intermediate activations and membrane potentials during training.
arXiv Detail & Related papers (2023-12-13T06:39:49Z)
GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z)
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning [14.126617899983097]
Deep Transformer Q-Networks (DTQN) is a novel architecture utilizing transformers and self-attention to encode an agent's history. Our experiments demonstrate the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.
arXiv Detail & Related papers (2022-06-02T15:04:18Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Online Training of Spiking Recurrent Neural Networks with Phase-Change Memory Synapses [1.9809266426888898]
Training spiking neural networks (RNNs) on dedicated neuromorphic hardware is still an open challenge. We present a simulation framework of differential-architecture arrays based on an accurate and comprehensive Phase-Change Memory (PCM) device model. We train a spiking RNN whose weights are emulated in the presented simulation framework, using a recently proposed e-prop learning rule.
arXiv Detail & Related papers (2021-08-04T01:24:17Z)
Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred. This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.