Related papers: Fast Training of Recurrent Neural Networks with Stationary State Feedbacks

Fast Training of Recurrent Neural Networks with Stationary State Feedbacks

URL: http://arxiv.org/abs/2503.23104v1
Date: Sat, 29 Mar 2025 14:45:52 GMT
Title: Fast Training of Recurrent Neural Networks with Stationary State Feedbacks
Authors: Paul Caillon, Erwan Fagnou, Alexandre Allauzen,
Abstract summary: Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers.<n>We propose a novel method that replaces BPTT with a fixed gradient feedback mechanism.
Score: 48.22082789438538
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers at comparable parameter budgets. However, the recursive gradient computation with the backpropagation through time (or BPTT) algorithm remains the major computational bottleneck. In this work, we propose a novel method that replaces BPTT with a fixed gradient feedback mechanism, yielding an efficient approximation of the exact gradient propagation based on the assumption of time stationarity. Our approach leverages state-space model (SSM) principles to define a structured feedback matrix that directly propagates gradients from future time steps. This formulation bypasses the need for recursive gradient backpropagation, significantly reducing training overhead while preserving the network's ability to capture long-term dependencies. The experiments on language modeling benchmarks exhibit competitive perplexity scores, while significantly reducing the training costs. These promising results suggest that designing a feedback method like an SSM can fully exploit the efficiency advantages of RNNs for many practical applications.

Related papers

When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training [58.25341036646294]
We analytically examine why learning recurrent poles does not provide tangible benefits in data and empirically offer real-time learning scenarios.<n>We show that fixed-pole networks achieve superior performance with lower training complexity, making them more suitable for online real-time tasks.
arXiv Detail & Related papers (2026-02-25T00:15:13Z)
ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling [57.91760520589592]
Scaling network depth has been a central driver behind the success of modern foundation models.<n>This paper revisits the default mechanism for deepening neural networks, namely residual connections.<n>We introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data.
arXiv Detail & Related papers (2026-02-09T18:54:18Z)
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training [67.45211108321203]
We introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer.<n>We show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs.
arXiv Detail & Related papers (2025-06-05T16:50:23Z)
Accelerated Training through Iterative Gradient Propagation Along the Residual Path [46.577761606415805]
Highway backpropagation is a parallelizable iterative algorithm that approximates backpropagation. It is adaptable to a diverse set of common architectures, ranging from ResNets and Transformers to recurrent neural networks.
arXiv Detail & Related papers (2025-01-28T17:14:42Z)
Advancing Training Efficiency of Deep Spiking Neural Networks through Rate-based Backpropagation [8.683798989767771]
Recent insights have revealed that rate-coding is a primary form of information representation captured by surrogate-gradient-based Backpropagation Through Time (BPTT) in training deep Spiking Neural Networks (SNNs) We propose rate-based backpropagation, a training strategy specifically designed to exploit rate-based representations to reduce the complexity of BPTT. Our method minimizes reliance on detailed temporal derivatives by focusing on averaged dynamics, streamlining the computational graph to reduce memory and computational demands of SNNs training.
arXiv Detail & Related papers (2024-10-15T10:46:03Z)
Efficient Training of Deep Neural Operator Networks via Randomized Sampling [0.0]
Deep operator network (DeepNet) has demonstrated success in the real-time prediction of complex dynamics across various scientific and engineering applications. We introduce a random sampling technique to be adopted the training of DeepONet, aimed at improving generalization ability of the model, while significantly reducing computational time. Our results indicate that incorporating randomization in the trunk network inputs during training enhances the efficiency and robustness of DeepONet, offering a promising avenue for improving the framework's performance in modeling complex physical systems.
arXiv Detail & Related papers (2024-09-20T07:18:31Z)
Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time. BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z)
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training [30.452060061499523]
We introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. Experiments demonstrate the effectiveness of the approximation technique in neural network training.
arXiv Detail & Related papers (2024-03-18T23:23:50Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization [79.42778415729475]
We explore an alternative solution based on explicit memorization using linear autoencoders for sequences. We show how such pretraining can better support solving hard classification tasks with long sequences. We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.
arXiv Detail & Related papers (2020-11-05T14:57:16Z)
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance. This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.