Related papers: A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems

A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems

URL: http://arxiv.org/abs/2509.21716v1
Date: Fri, 26 Sep 2025 00:27:02 GMT
Title: A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems
Authors: Xavier Gonzalez, E. Kelly Buchanan, Hyun Dong Lee, Jerry Weihong Liu, Ke Alexander Wang, David M. Zoltowski, Christopher Ré, Scott W. Linderman,
Abstract summary: Several approaches have been proposed for evaluating sequential processes in parallel using fixed-point methods.<n>We show that these methods can be understood within a common framework based on linear dynamical systems.<n>This unifying view highlights shared principles behind these techniques and clarifies when particular fixed-point methods are most likely to be effective.
Score: 41.44667250045256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Harnessing parallelism in seemingly sequential models is a central challenge for modern machine learning. Several approaches have been proposed for evaluating sequential processes in parallel using fixed-point methods, like Newton, Picard, and Jacobi iterations. In this work, we show that these methods can be understood within a common framework based on linear dynamical systems (LDSs), where different iteration schemes arise naturally as approximate linearizations of a nonlinear recursion. This unifying view highlights shared principles behind these techniques and clarifies when particular fixed-point methods are most likely to be effective. By bridging diverse algorithms through the language of LDSs, our framework provides a clearer theoretical foundation for parallelizing sequential models and points toward new opportunities for efficient and scalable computation.

Related papers

PRISM: Parallel Residual Iterative Sequence Model [52.26239951489612]
We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
arXiv Detail & Related papers (2026-02-11T12:39:41Z)
ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z)
A supervised discriminant data representation: application to pattern classification [8.941002231783067]
We propose a hybrid linear feature extraction scheme to be used in supervised multi-class classification problems.<n>Inspired by two recent linear discriminant methods, we propose a unifying criterion that is able to retain the advantages of these two powerful methods.<n>The proposed framework is generic in the sense that it allows the combination and tuning of other linear discriminant embedding methods.
arXiv Detail & Related papers (2025-10-24T14:30:57Z)
Design Principles for Sequence Models via Coefficient Dynamics [20.14360019974826]
We develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs.<n>This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures.<n>Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention.
arXiv Detail & Related papers (2025-10-10T13:42:31Z)
Bayesian Nonparametric Dynamical Clustering of Time Series [3.8090256115307555]
We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics.<n>We perform inference by formulating a variational lower bound for off-line and on-line scenarios.<n>We illustrate the versatility and effectiveness of the approach through several case studies of electrocardiogram analysis using publicly available databases.
arXiv Detail & Related papers (2025-10-08T11:52:39Z)
A Parallelizable Approach for Characterizing NE in Zero-Sum Games After a Linear Number of Iterations of Gradient Descent [1.1970409518725493]
We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains.<n>We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating descent in the gradient setting, modulo degeneracy.<n>Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory.
arXiv Detail & Related papers (2025-07-15T14:39:40Z)
Efficient identification of linear, parameter-varying, and nonlinear systems with noise models [1.6385815610837167]
We present a general system identification procedure capable of estimating a broad spectrum of state-space dynamical models.<n>We show that for this general class of model structures, the model dynamics can be separated into a deterministic process and a noise part.<n>We parameterize the involved nonlinear functional relations by means of artificial neural-networks (ANNs)
arXiv Detail & Related papers (2025-04-16T11:23:30Z)
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization [4.272515397452792]
We present a theoretical framework for analyzing linear attention models through matrix-valued state space models (SSMs)<n>Our approach, Parallel Flows, provides a perspective that systematically decouples temporal dynamics from implementation constraints.<n>As a concrete application, we analyze DeltaNet in a generalized low-rank setting motivated by recent theoretical advances.
arXiv Detail & Related papers (2025-04-01T07:34:07Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning. We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning. We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z)
Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods [3.9391112596932243]
Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL) We present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning.
arXiv Detail & Related papers (2022-02-14T18:32:57Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Randomized Block-Diagonal Preconditioning for Parallel Learning [0.0]
We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique.
arXiv Detail & Related papers (2020-06-24T10:12:36Z)
Reinforcement Learning as Iterative and Amortised Inference [62.997667081978825]
We use the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored.
arXiv Detail & Related papers (2020-06-13T16:10:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.