Related papers: Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations

Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations

URL: http://arxiv.org/abs/2503.10799v1
Date: Thu, 13 Mar 2025 18:50:22 GMT
Title: Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Authors: Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, Antonio Orvieto,
Abstract summary: We compute a dense linear RNN as the fixed-point of a parallelizable diagonal linear RNN in a single layer.<n>We achieve state-of-the-art results on the commonly used toy tasks $A_5$, $S_5$, copying, and modular arithmetics.
Score: 10.851383867834052
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. In this paper, we propose to compute a dense linear RNN as the fixed-point of a parallelizable diagonal linear RNN in a single layer. We explore mechanisms to improve its memory and state-tracking abilities in practice, and achieve state-of-the-art results on the commonly used toy tasks $A_5$, $S_5$, copying, and modular arithmetics. We hope our results will open new avenues to more expressive and efficient sequence mixers.

Related papers

HadamRNN: Binary and Sparse Ternary Orthogonal RNNs [6.524758376347808]
Binary and sparse ternary weights in neural networks enable faster computations and lighter representations.<n> vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging.<n>We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary matrices.
arXiv Detail & Related papers (2025-01-28T09:16:28Z)
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons [2.9410174624086025]
We present a $SigmaDelta$-low-pass RNN (lpRNN) for mapping rate-based RNNs to spiking neural networks (SNNs) An adaptive spiking neuron model encodes signals using $SigmaDelta$-modulation and enables precise mapping. We demonstrate the implementation of the lpRNN on Intel's neuromorphic research chip Loihi.
arXiv Detail & Related papers (2024-07-18T14:06:07Z)
Attention as an RNN [66.5420926480473]
We show that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its textitmany-to-one RNN output efficiently. We introduce a new efficient method of computing attention's textitmany-to-many RNN output based on the parallel prefix scan algorithm. We show Aarens achieve comparable performance to Transformers on $38$ datasets spread across four popular sequential problem settings.
arXiv Detail & Related papers (2024-05-22T19:45:01Z)
Adaptive-saturated RNN: Remember more with less instability [2.191505742658975]
This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two approaches. Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors.
arXiv Detail & Related papers (2023-04-24T02:28:03Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware. It is a challenge to efficiently train SNNs due to their non-differentiability. We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z)
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN [9.20540910698296]
We discuss the similarities between recurrent neural network (RNN) and serial adder. Inspired by carry-lookahead adder, we introduce carry-lookahead module to RNN, which makes it possible for RNN to run in parallel.
arXiv Detail & Related papers (2021-06-22T12:28:33Z)
Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
DiffRNN: Differential Verification of Recurrent Neural Networks [3.4423518864863154]
Recurrent neural networks (RNNs) have become popular in a variety of applications such as image processing, data classification, speech recognition, and as controllers in autonomous systems. We propose DIFFRNN, the first differential verification method for RNNs to certify the equivalence of two structurally similar neural networks. We demonstrate the practical efficacy of our technique on a variety of benchmarks and show that DIFFRNN outperforms state-of-the-art verification tools such as POPQORN.
arXiv Detail & Related papers (2020-07-20T14:14:35Z)
Matrix Smoothing: A Regularization for DNN with Transition Matrix under Noisy Labels [54.585681272543056]
Training deep neural networks (DNNs) in the presence of noisy labels is an important and challenging task. Recent probabilistic methods directly apply transition matrix to DNN, neglect DNN's susceptibility to overfitting. We propose a novel method, in which a smoothed transition matrix is used for updating DNN, to restrict the overfitting.
arXiv Detail & Related papers (2020-03-26T13:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.