Revisiting Bi-Linear State Transitions in Recurrent Neural Networks
- URL: http://arxiv.org/abs/2505.21749v1
- Date: Tue, 27 May 2025 20:38:19 GMT
- Title: Revisiting Bi-Linear State Transitions in Recurrent Neural Networks
- Authors: M. Reza Ebrahimi, Roland Memisevic,
- Abstract summary: We show that bi-linear state updates constitute a natural inductive bias for representing the evolution of hidden states in state tracking tasks.<n>We also show that bi-linear state updates form a natural hierarchy corresponding to state tracking tasks of increasing complexity, with popular linear recurrent networks such as Mamba residing at the lowest-complexity center of that hierarchy.
- Score: 0.3218642352128729
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The role of hidden units in recurrent neural networks is typically seen as modeling memory, with research focusing on enhancing information retention through gating mechanisms. A less explored perspective views hidden units as active participants in the computation performed by the network, rather than passive memory stores. In this work, we revisit bi-linear operations, which involve multiplicative interactions between hidden units and input embeddings. We demonstrate theoretically and empirically that they constitute a natural inductive bias for representing the evolution of hidden states in state tracking tasks. These are the simplest type of task that require hidden units to actively contribute to the behavior of the network. We also show that bi-linear state updates form a natural hierarchy corresponding to state tracking tasks of increasing complexity, with popular linear recurrent networks such as Mamba residing at the lowest-complexity center of that hierarchy.
Related papers
- Egocentric Visual Navigation through Hippocampal Sequences [0.0]
We show that hippocampal sequences arise from intrinsic recurrent circuitry that propagates activity without readily available input.<n>We implement a minimal sequence generator inspired by neurobiology and pair it with an actor-critic learner for egocentric visual navigation.
arXiv Detail & Related papers (2025-10-11T01:38:23Z) - Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z) - Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations.<n>In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z) - Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z) - Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - DISCOVER: Making Vision Networks Interpretable via Competition and
Dissection [11.028520416752325]
This work contributes to post-hoc interpretability, and specifically Network Dissection.
Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task.
arXiv Detail & Related papers (2023-10-07T21:57:23Z) - Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust
Closed-Loop Control [63.310780486820796]
We show how a parameterization of recurrent connectivity influences robustness in closed-loop settings.
We find that closed-form continuous-time neural networks (CfCs) with fewer parameters can outperform their full-rank, fully-connected counterparts.
arXiv Detail & Related papers (2023-10-05T21:44:18Z) - Approximating nonlinear functions with latent boundaries in low-rank
excitatory-inhibitory spiking networks [5.955727366271805]
We put forth a new framework for spike-based excitatory-inhibitory spiking networks.
Our work proposes a new perspective on spiking networks that may serve as a starting point for a mechanistic understanding of biological spike-based computation.
arXiv Detail & Related papers (2023-07-18T15:17:00Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - Brain-like combination of feedforward and recurrent network components
achieves prototype extraction and robust pattern recognition [0.0]
Associative memory has been a prominent candidate for the computation performed by the massively recurrent neocortical networks.
We combine a recurrent attractor network with a feedforward network that learns distributed representations using an unsupervised Hebbian-Bayesian learning rule.
We demonstrate that the recurrent attractor component implements associative memory when trained on the feedforward-driven internal (hidden) representations.
arXiv Detail & Related papers (2022-06-30T06:03:11Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Internal representation dynamics and geometry in recurrent neural
networks [10.016265742591674]
We show how a vanilla RNN implements a simple classification task by analysing the dynamics of the network.
We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer.
arXiv Detail & Related papers (2020-01-09T23:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.