Related papers: Attention as an Adaptive Filter

Attention as an Adaptive Filter

URL: http://arxiv.org/abs/2509.04154v3
Date: Tue, 14 Oct 2025 02:25:03 GMT
Title: Attention as an Adaptive Filter
Authors: Peter Racioppo,
Abstract summary: We introduce Adaptive Filter Attention (AFA), a novel attention mechanism that incorporates a learnable dynamics model directly into computation of attention weights.<n>By assuming a continuous-time linear time-invariant system, we can make use of a closed-form solution of the differential Lyapunov equation to efficiently propagate uncertainties through the dynamics from keys to queries.<n>A generalization of attention naturally arises as the likelihood maximum solution for filtering the trajectory of this linear SDE, with attention weights corresponding to robust residual-based reweightings of the propagated query-key precisions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Adaptive Filter Attention (AFA), a novel attention mechanism that incorporates a learnable dynamics model directly into the computation of attention weights. Rather than comparing queries and keys directly, we model the input sequence as discrete observations of a linear stochastic differential equation (SDE). By assuming a continuous-time linear time-invariant system with simultaneously-diagonalizable state matrices and noise covariances, we can make use of a closed-form solution of the differential Lyapunov equation to efficiently propagate uncertainties through the dynamics from keys to queries. A generalization of attention naturally arises as the maximum likelihood solution for filtering the trajectory of this linear SDE, with attention weights corresponding to robust residual-based reweightings of the propagated query-key precisions. We further constrain the system dynamics and noise in order to obtain a simplified variant with the same computational and memory complexity as standard attention. In the limit of zero decay and process noise, and using a small-angle approximation, we recover a complex-valued generalization of ordinary dot-product attention with rotary positional encodings.

Related papers

Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z)
Nonparametric learning of stochastic differential equations from sparse and noisy data [2.389598109913754]
We learn the entire drift function directly from data without strong structural assumptions.<n>We develop an Expectation-Maximization (EM) algorithm that employs a novel Sequential Monte Carlo (SMC) method.<n>The resulting EM-SMC-RKHS procedure enables accurate estimation of the drift function of dynamical systems in low-data regimes.
arXiv Detail & Related papers (2025-08-15T17:01:59Z)
Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation [55.88862563823878]
In this work, we present an original algorithm to coarsen an unstructured grid based on the concepts of differentiable physics.<n>We demonstrate performance of the algorithm on two PDEs: a linear equation which governs slightly compressible fluid flow in porous media and the wave equation.<n>Our results show that in the considered scenarios, we reduced the number of grid points up to 10 times while preserving the modeled variable dynamics in the points of interest.
arXiv Detail & Related papers (2025-07-24T11:02:13Z)
Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration [1.3654846342364308]
We present a framework that combines classical dynamic programming with neural networks (PINNs) to solve non-subscriber Hamilton-Jacobi-Isaac equations.<n>Our results suggest that integrating PINNs with policy policy is a practical and theoretically grounded method for solving high-dimensional, nonsubscriber HJI equations.
arXiv Detail & Related papers (2025-07-21T10:06:53Z)
On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z)
Weak Collocation Regression for Inferring Stochastic Dynamics with L\'{e}vy Noise [8.15076267771005]
We propose a weak form of the Fokker-Planck (FP) equation for extracting dynamics with L'evy noise. Our approach can simultaneously distinguish mixed noise types, even in multi-dimensional problems.
arXiv Detail & Related papers (2024-03-13T06:54:38Z)
Joint State Estimation and Noise Identification Based on Variational Optimization [8.536356569523127]
A novel adaptive Kalman filter method based on conjugate-computation variational inference, referred to as CVIAKF, is proposed. The effectiveness of CVIAKF is validated through synthetic and real-world datasets of maneuvering target tracking.
arXiv Detail & Related papers (2023-12-15T07:47:03Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
System Identification for Continuous-time Linear Dynamical Systems [0.7510165488300368]
Generalizing the learning of latent linear dynamical systems to continuous-time may extend the use of the hybrid Kalman filter. We apply the method by learning the parameters of a latent, multivariate Fokker-Planck SDE representing a toggle-switch genetic circuit.
arXiv Detail & Related papers (2023-08-23T05:53:13Z)
Effective Hamiltonian approach to the exact dynamics of open system by complex discretization approximation for environment [0.0]
We propose a generalization of the discretization approximation method into the complex frequency space basing on complex Gauss quadratures.<n>An effective Hamiltonian can be established by this way, which is non-Hermitian and demonstrates the complex energy modes with negative imaginary part.
arXiv Detail & Related papers (2023-03-12T05:34:29Z)
Numerical Solution of Stiff Ordinary Differential Equations with Random Projection Neural Networks [0.0]
We propose a numerical scheme based on Random Projection Neural Networks (RPNN) for the solution of Ordinary Differential Equations (ODEs) We show that our proposed scheme yields good numerical approximation accuracy without being affected by the stiffness, thus outperforming in same cases the textttode45 and textttode15s functions.
arXiv Detail & Related papers (2021-08-03T15:49:17Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems. We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems. Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Stochastic Normalizing Flows [52.92110730286403]
We introduce normalizing flows for maximum likelihood estimation and variational inference (VI) using differential equations (SDEs) Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs. These SDEs can be used for constructing efficient chains to sample from the underlying distribution of a given dataset.
arXiv Detail & Related papers (2020-02-21T20:47:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.