Related papers: ParallelFlow: Parallelizing Linear Transformers via Flow Discretization

ParallelFlow: Parallelizing Linear Transformers via Flow Discretization

URL: http://arxiv.org/abs/2504.00492v1
Date: Tue, 01 Apr 2025 07:34:07 GMT
Title: ParallelFlow: Parallelizing Linear Transformers via Flow Discretization
Authors: Nicola Muca Cirone, Cristopher Salvi,
Abstract summary: We present a theoretical framework for analyzing linear attention models through matrix-valued state space models (SSMs)<n>Our approach, Parallel Flows, provides a perspective that systematically decouples temporal dynamics from implementation constraints.<n>As a concrete application, we analyze DeltaNet in a generalized low-rank setting motivated by recent theoretical advances.
Score: 4.272515397452792
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a theoretical framework for analyzing linear attention models through matrix-valued state space models (SSMs). Our approach, Parallel Flows, provides a perspective that systematically decouples temporal dynamics from implementation constraints, enabling independent analysis of critical algorithmic components: chunking, parallelization, and information aggregation. Central to this framework is the reinterpretation of chunking procedures as computations of the flows governing system dynamics. This connection establishes a bridge to mathematical tools from rough path theory, opening the door to new insights into sequence modeling architectures. As a concrete application, we analyze DeltaNet in a generalized low-rank setting motivated by recent theoretical advances. Our methods allow us to design simple, streamlined generalizations of hardware-efficient algorithms present in the literature, and to provide completely different ones, inspired by rough paths techniques, with provably lower complexity. This dual contribution demonstrates how principled theoretical analysis can both explain existing practical methods and inspire fundamentally new computational approaches.

Related papers

Why Flow Matching is Particle Swarm Optimization? [0.0]
This paper preliminarily investigates the duality between flow matching in generative models and particle swarm optimization (PSO) in evolutionary computation.<n>We reveal the intrinsic connections between these two approaches in terms of their mathematical formulations and optimization mechanisms.<n>Although this paper only presents preliminary discussions, the revealed correspondences suggest several promising research directions.
arXiv Detail & Related papers (2025-07-28T13:21:14Z)
Flow-Through Tensors: A Unified Computational Graph Architecture for Multi-Layer Transportation Network Optimization [20.685856719515026]
Flow Throughs (FTT) is a unified computational graph architecture that connects origin destination flows, path, probabilities and link travel times as interconnected tensors.<n>Our framework makes three key contributions: first, it establishes a consistent mathematical structure that enables gradient-based optimization across previously separate modeling elements.<n>Second, it supports multidimensional analysis of traffic patterns over time, space, and user groups with precise quantification of system efficiency.
arXiv Detail & Related papers (2025-06-30T06:42:23Z)
Provable Model-Parallel Distributed Principal Component Analysis with Parallel Deflation [9.613011825024476]
We study a distributed PCA framework where each worker targets a distinct eigenvector and refines its solution by updating from intermediate solutions provided by peers deemed as "superior"<n>Our proposed framework offers comparable performance to EigenGame-$mu$, the state-of-the-art model-parallel PCA solver.
arXiv Detail & Related papers (2025-02-24T20:02:27Z)
Dynamical Mean-Field Theory of Self-Attention Neural Networks [0.0]
Transformer-based models have demonstrated exceptional performance across diverse domains. Little is known about how they operate or what are their expected dynamics. We use methods for the study of asymmetric Hopfield networks in nonequilibrium regimes.
arXiv Detail & Related papers (2024-06-11T13:29:34Z)
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Stochastic normalizing flows as non-equilibrium transformations [62.997667081978825]
We show that normalizing flows provide a route to sample lattice field theories more efficiently than conventional MonteCarlo simulations. We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.
arXiv Detail & Related papers (2022-01-21T19:00:18Z)
Approximation Theory of Convolutional Architectures for Time Series Modelling [15.42770933459534]
We study the approximation properties of convolutional architectures applied to time series modelling. Recent results reveal an intricate connection between approximation efficiency and memory structures in the data generation process.
arXiv Detail & Related papers (2021-07-20T09:19:26Z)
Flow-based sampling for multimodal and extended-mode distributions in lattice field theory [3.9492325196180715]
We present a set of training- and architecture-based methods to construct flow models for targets with multiple separated modes.<n>We demonstrate the application of these methods to modeling two-dimensional real and complex scalar field theories.
arXiv Detail & Related papers (2021-07-01T20:22:10Z)
Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals. We propose a general graph estimator based on a novel structured fusion regularization. We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z)
Modern Koopman Theory for Dynamical Systems [2.5889588665122725]
We provide an overview of modern Koopman operator theory, describing recent theoretical and algorithmic developments. We also discuss key advances and challenges in the rapidly growing field of machine learning.
arXiv Detail & Related papers (2021-02-24T06:18:16Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
Operator Inference and Physics-Informed Learning of Low-Dimensional Models for Incompressible Flows [5.756349331930218]
We suggest a new approach to learning structured low-order models for incompressible flow from data. We show that learning dynamics of the velocity and pressure can be decoupled, thus leading to an efficient operator inference approach.
arXiv Detail & Related papers (2020-10-13T21:26:19Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.