PRISM: Parallel Residual Iterative Sequence Model
- URL: http://arxiv.org/abs/2602.10796v2
- Date: Thu, 12 Feb 2026 02:56:17 GMT
- Title: PRISM: Parallel Residual Iterative Sequence Model
- Authors: Jie Jiang, Ke Cheng, Xin Xu, Mengyang Pang, Tianhao Lu, Jiaheng Li, Yue Liu, Yuan Wang, Jun Zhang, Huan Yu, Zhouchen Lin,
- Abstract summary: We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
- Score: 52.26239951489612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative sequence modeling faces a fundamental tension between the expressivity of Transformers and the efficiency of linear sequence models. Existing efficient architectures are theoretically bounded by shallow, single-step linear updates, while powerful iterative methods like Test-Time Training (TTT) break hardware parallelism due to state-dependent gradients. We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension. PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form. We employ a Write-Forget Decoupling strategy that isolates non-linearity within the injection operator. To bypass the serial dependency of explicit solvers, PRISM utilizes a two-stage proxy architecture: a short-convolution anchors the initial residual using local history energy, while a learned predictor estimates the refinement updates directly from the input. This design distills structural patterns associated with iterative correction into a parallelizable feedforward operator. Theoretically, we prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck. Empirically, it achieves comparable performance to explicit optimization methods while achieving 174x higher throughput.
Related papers
- Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling [0.0]
Gated Associative Memory (GAM) network is a novel, fully parallel architecture for sequence modeling.<n>We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline.<n>Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets.
arXiv Detail & Related papers (2025-08-30T20:59:46Z) - Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention [0.0]
This paper introduces a novel linear attention architecture-textbfTLinFormer.<n>By reconfiguring neuron connection patterns, TLinFormer achieves strict linear complexity while computing exact attention scores.<n>We show that TLinFormer exhibits overwhelming advantages in key metrics such as textbfinference latency, textbfKV cache efficiency, and textbfmemory footprint.
arXiv Detail & Related papers (2025-08-28T04:10:19Z) - Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z) - Inference Acceleration of Autoregressive Normalizing Flows by Selective Jacobi Decoding [12.338918067455436]
Normalizing flows are promising generative models with advantages such as theoretical rigor, analytical log-likelihood, and end-to-end training.<n>Recent advancements utilize autoregressive modeling, significantly enhancing expressive power and generation quality.<n>We propose a selective Jacobi decoding (SeJD) strategy that accelerates autoregressive inference through parallel iterative optimization.
arXiv Detail & Related papers (2025-05-30T16:53:15Z) - Nesterov Method for Asynchronous Pipeline Parallel Optimization [59.79227116582264]
We introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in Pipeline Parallelism.<n>Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients.<n>We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients.
arXiv Detail & Related papers (2025-05-02T08:23:29Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - State Soup: In-Context Skill Learning, Retrieval and Mixing [22.485700977542127]
A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems.
Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter.
Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined.
arXiv Detail & Related papers (2024-06-12T17:06:07Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Self-Reflective Variational Autoencoder [21.054722609128525]
Variational Autoencoder (VAE) is a powerful framework for learning latent variable generative models.
We introduce a solution, which we call self-reflective inference.
We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior.
arXiv Detail & Related papers (2020-07-10T05:05:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.