Fractional neural attention for efficient multiscale sequence processing
- URL: http://arxiv.org/abs/2511.10208v1
- Date: Fri, 14 Nov 2025 01:39:05 GMT
- Title: Fractional neural attention for efficient multiscale sequence processing
- Authors: Cheng Kevin Qu, Andrew Ly, Pulin Gong,
- Abstract summary: We introduce Fractional Neural Attention (FNA), a principled framework for multiscale information processing.<n>FNA models token interactions through Lévy diffusion governed by the fractional Laplacian.<n>FNA achieves competitive text-classification performance even with a single layer and a single head.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention mechanisms underpin the computational power of Transformer models, which have achieved remarkable success across diverse domains. Yet understanding and extending the principles underlying self-attention remains a key challenge for advancing artificial intelligence. Drawing inspiration from the multiscale dynamics of biological attention and from dynamical systems theory, we introduce Fractional Neural Attention (FNA), a principled, neuroscience-inspired framework for multiscale information processing. FNA models token interactions through Lévy diffusion governed by the fractional Laplacian, intrinsically realizing simultaneous short- and long-range dependencies across multiple scales. This mechanism yields greater expressivity and faster information mixing, advancing the foundational capacity of Transformers. Theoretically, we show that FNA's dynamics are governed by the fractional diffusion equation, and that the resulting attention networks exhibit larger spectral gaps and shorter path lengths -- mechanistic signatures of enhanced computational efficiency. Empirically, FNA achieves competitive text-classification performance even with a single layer and a single head; it also improves performance in image processing and neural machine translation. Finally, the diffusion map algorithm from geometric harmonics enables dimensionality reduction of FNA weights while preserving the intrinsic structure of embeddings and hidden states. Together, these results establish FNA as a principled mechanism connecting self-attention, stochastic dynamics, and geometry, providing an interpretable, biologically grounded foundation for powerful, neuroscience-inspired AI.
Related papers
- General Self-Prediction Enhancement for Spiking Neurons [71.01912385372577]
Spiking Neural Networks (SNNs) are highly energy-efficient due to event-driven, sparse computation, but their training is challenged by spike non-differentiability and trade-offs among performance, efficiency, and biological plausibility.<n>We propose a self-prediction enhanced spiking neuron method that generates an internal prediction current from its input-output history to modulate membrane potential.<n>This design offers dual advantages, it creates a continuous gradient path that alleviates vanishing gradients and boosts training stability and accuracy, while also aligning with biological principles, which resembles distal dendritic modulation and error-driven synaptic plasticity.
arXiv Detail & Related papers (2026-01-29T15:08:48Z) - Diffusion-Guided Renormalization of Neural Systems via Tensor Networks [0.0]
Far from equilibrium, neural systems self-organize across multiple scales.<n>Exploiting multiscale self-organization in neuroscience and artificial intelligence requires a computational framework.<n>I develop a scalable graph inference algorithm for discovering community structure from subsampled neural activity.
arXiv Detail & Related papers (2025-10-07T18:26:10Z) - Graph Neural Diffusion via Generalized Opinion Dynamics [8.691309696914882]
We propose GODNF, which unifies multiple opinion dynamics models into a principled, trainable diffusion mechanism.<n>Our framework captures heterogeneous diffusion patterns and temporal dynamics via node-specific behavior modeling and dynamic neighborhood influence.<n>We provide a rigorous theoretical analysis demonstrating GODNF's ability to model diverse convergence configurations.
arXiv Detail & Related papers (2025-08-15T06:36:57Z) - Dynamical Alignment: A Principle for Adaptive Neural Computation [1.0974389213466795]
We show that a fixed neural structure can operate in fundamentally different computational modes, driven not by its structure but by the temporal dynamics of its input signals.<n>We find this computational advantage emerges from a timescale alignment between input dynamics and neuronal integration.<n>This principle offers a unified, computable perspective on long-observed dualities in neuroscience, from stability-plasticity dilemma to segregation-integration dynamic.
arXiv Detail & Related papers (2025-08-13T06:35:57Z) - Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training [63.3991315762955]
Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation.<n>Most existing SNNs assume a single time constant for neuronal membrane voltage dynamics, modeled by first-order ordinary differential equations (ODEs) with Markovian characteristics.<n>We propose the Fractional SPIKE Differential Equation neural network (fspikeDE), which captures long-term dependencies in membrane voltage and spike trains through fractional-order dynamics.
arXiv Detail & Related papers (2025-07-22T18:20:56Z) - Langevin Flows for Modeling Neural Latent Dynamics [81.81271685018284]
We introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation.<n>Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and forces -- to represent both autonomous and non-autonomous processes in neural systems.<n>Our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor.
arXiv Detail & Related papers (2025-07-15T17:57:48Z) - CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z) - Self-orthogonalizing attractor neural networks emerging from the free energy principle [0.0]
We formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems.<n>Our approach obviates the need for explicitly imposed learning and inference rules.<n>Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.
arXiv Detail & Related papers (2025-05-28T18:10:03Z) - Neural Manifolds and Cognitive Consistency: A New Approach to Memory Consolidation in Artificial Systems [0.0]
We introduce a novel mathematical framework that unifies neural population dynamics, hippocampal sharp wave-ripple (SpWR) generation, and cognitive consistency constraints inspired by Heider's theory.<n>Our model leverages low-dimensional manifold representations to capture structured neural drift and incorporates a balance energy function to enforce coherent synaptic interactions.<n>This work paves the way for scalable neuromorphic architectures that bridge neuroscience and artificial intelligence, offering more robust and adaptive learning mechanisms for future intelligent systems.
arXiv Detail & Related papers (2025-02-25T18:28:25Z) - Spatiotemporal Graph Learning with Direct Volumetric Information Passing and Feature Enhancement [62.91536661584656]
We propose a dual-module framework, Cell-embedded and Feature-enhanced Graph Neural Network (aka, CeFeGNN) for learning.<n>We embed learnable cell attributions to the common node-edge message passing process, which better captures the spatial dependency of regional features.<n>Experiments on various PDE systems and one real-world dataset demonstrate that CeFeGNN achieves superior performance compared with other baselines.
arXiv Detail & Related papers (2024-09-26T16:22:08Z) - Transformers from Diffusion: A Unified Framework for Neural Message Passing [79.9193447649011]
Message passing neural networks (MPNNs) have become a de facto class of model solutions.<n>We propose an energy-constrained diffusion model, which integrates the inductive bias of diffusion with layer-wise constraints of energy.<n>Building on these insights, we devise a new class of message passing models, dubbed Transformers (DIFFormer), whose global attention layers are derived from the principled energy-constrained diffusion framework.
arXiv Detail & Related papers (2024-09-13T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.