Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
- URL: http://arxiv.org/abs/2602.09080v1
- Date: Mon, 09 Feb 2026 17:58:23 GMT
- Title: Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models
- Authors: Ruihan Xu, Yuting Gao, Lan Wang, Jianing Li, Weihao Chen, Qingpei Guo, Ming Yang, Shiliang Zhang,
- Abstract summary: Large Multimodal Models (LMMs) have achieved remarkable success in vision-language computation tasks.<n>But their vast parameter counts are often underutilized during both training and inference.<n>We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs.
- Score: 63.47909317137073
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: reusing model parameters through recursive refinement to extract stronger multimodal representations without increasing model size. We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs. Two key innovations enable effective looping: (i) a Recursive Connector that aligns features across recursion steps by fusing intermediate-layer hidden states and applying modality-specific projections, respecting the distinct statistical structures of vision and language tokens; (ii) a Monotonic Recursion Loss that supervises every step and guarantees performance improves monotonically with recursion depth. This design transforms recursion into an on-demand refinement mechanism: delivering strong results with few loops on resource-constrained devices and progressively improving outputs when more computation resources are available. Experiments show consistent gains of +3% over standard Transformers and +7% over vanilla recursive baselines, demonstrating that strategic looping is a powerful path toward efficient, deployment-adaptive LMMs.
Related papers
- SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion [24.26069897783496]
SpiralFormer is a looped Transformer that executes recurrence under a multi-resolution recursion schedule.<n>We show that SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B.
arXiv Detail & Related papers (2026-02-12T08:23:21Z) - ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents [61.51091799997476]
We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in large language models (LLMs)<n>ReCAP combines three key mechanisms: plan-ahead decomposition, structured re-injection of parent plans, and memory-efficient execution.<n>Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T20:03:55Z) - MeSH: Memory-as-State-Highways for Recursive Transformers [23.995570647573484]
Recursive models with fewer parameters often lag behind non-recursive counterparts under matched compute.<n>By probing hidden states, we trace this performance gap to two primary bottlenecks.<n>We introduce a Memory-as-State-Highways scheme, which externalizes state management into an explicit memory buffer.
arXiv Detail & Related papers (2025-10-09T03:23:38Z) - Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation [61.67090981767583]
We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer.<n>MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking.<n>We also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to further decrease memory footprint.
arXiv Detail & Related papers (2025-07-14T17:49:00Z) - RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference [48.28847964704554]
Large language models (LLMs) have brought a great breakthrough to the natural language processing (NLP) community.
Data multiplexing addresses this by merging multiple inputs into a single composite input.
RevMUX is a parameter-efficient data multiplexing framework that incorporates a reversible design in the multiplexer.
arXiv Detail & Related papers (2024-10-06T15:24:55Z) - Continual Referring Expression Comprehension via Dual Modular
Memorization [133.46886428655426]
Referring Expression (REC) aims to localize an image region of a given object described by a natural-language expression.
Existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios.
In this paper, we propose Continual Referring Expression (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
In order to continuously improve the model on sequential tasks without forgetting prior learned knowledge and without repeatedly re-training from a scratch, we propose an effective baseline method named Dual Modular Memorization
arXiv Detail & Related papers (2023-11-25T02:58:51Z) - Online Multi-Task Learning with Recursive Least Squares and Recursive Kernel Methods [50.67996219968513]
We introduce two novel approaches for Online Multi-Task Learning (MTL) Regression Problems.
We achieve exact and approximate recursions with quadratic per-instance cost on the dimension of the input space.
We compare our online MTL methods to other contenders in a real-world wind speed forecasting case study.
arXiv Detail & Related papers (2023-08-03T01:41:34Z) - Sliced Recursive Transformer [23.899076070924153]
Recursive operation on vision transformers can improve parameter utilization without involving additional parameters.
Our model Sliced Recursive Transformer (SReT) is compatible with a broad range of other designs for efficient vision transformers.
arXiv Detail & Related papers (2021-11-09T17:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.