SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion
- URL: http://arxiv.org/abs/2602.11698v1
- Date: Thu, 12 Feb 2026 08:23:21 GMT
- Title: SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion
- Authors: Chengting Yu, Xiaobo Shu, Yadao Wang, Yizhen Zhang, Haoyi Wu, You Wu, Rujiao Long, Ziheng Chen, Yuchi Xu, Wenbo Su, Bo Zheng,
- Abstract summary: SpiralFormer is a looped Transformer that executes recurrence under a multi-resolution recursion schedule.<n>We show that SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B.
- Score: 24.26069897783496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement and latent reasoning. However, early looped Transformers often underperform non-recursive baselines of equal compute. While recent literature has introduced more effective recursion mechanisms to mitigate this gap, existing architectures still operate at a fixed, full-token resolution, neglecting the potential efficiency of computing over compressed latent representations. In this paper, we propose SpiralFormer, a looped Transformer that executes recurrence under a multi-resolution recursion schedule. We provide probing evidence that multi-resolution recursion enables the model to learn hierarchical dependencies by inducing iteration-wise functional specialization across different scales. Empirically, SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B, establishing sequence resolution as a potential axis for scaling recursive architectures.
Related papers
- PRISM: Parallel Residual Iterative Sequence Model [52.26239951489612]
We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
arXiv Detail & Related papers (2026-02-11T12:39:41Z) - Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models [63.47909317137073]
Large Multimodal Models (LMMs) have achieved remarkable success in vision-language computation tasks.<n>But their vast parameter counts are often underutilized during both training and inference.<n>We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs.
arXiv Detail & Related papers (2026-02-09T17:58:23Z) - Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks [1.0378456753266476]
We show that transformer architectures struggle with problems involving deeper recursion than encountered during training.<n>This limitation stems from their inability to maintain stack-like behavior.<n>We develop a novel looped locate-and-replace pipeline that decomposes problems into manageable subcomponents.
arXiv Detail & Related papers (2025-12-02T12:04:51Z) - Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis [50.11146543029802]
It introduces FractalNet, a fractal-inspired computational architectures for advanced large language model analysis.<n>The new set-up involves a template-driven generator, runner, and evaluation framework that, through systematic permutations of convolutional, normalization, activation, and dropout layers, can create more than 1,200 variants of neural networks.<n>The paper positions fractal design as a feasible and resource-efficient method of automated architecture exploration.
arXiv Detail & Related papers (2025-11-10T17:31:39Z) - MeSH: Memory-as-State-Highways for Recursive Transformers [23.995570647573484]
Recursive models with fewer parameters often lag behind non-recursive counterparts under matched compute.<n>By probing hidden states, we trace this performance gap to two primary bottlenecks.<n>We introduce a Memory-as-State-Highways scheme, which externalizes state management into an explicit memory buffer.
arXiv Detail & Related papers (2025-10-09T03:23:38Z) - Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation [61.67090981767583]
We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer.<n>MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking.<n>We also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to further decrease memory footprint.
arXiv Detail & Related papers (2025-07-14T17:49:00Z) - To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers [32.84174396586435]
Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks.<n>We provide a formal analysis of their respective strengths and limitations.
arXiv Detail & Related papers (2025-05-25T17:49:37Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Recursive Reinforcement Learning [4.429642479975602]
Recursion is the fundamental paradigm to finitely describe potentially infinite objects.
We develop RL algorithms capable of computing optimal policies in environments described as a collection of Markov decision processes.
arXiv Detail & Related papers (2022-06-23T00:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.