Related papers: Recursive Models for Long-Horizon Reasoning

Recursive Models for Long-Horizon Reasoning

URL: http://arxiv.org/abs/2603.02112v1
Date: Mon, 02 Mar 2026 17:37:10 GMT
Title: Recursive Models for Long-Horizon Reasoning
Authors: Chenxiao Yang, Nathan Srebro, Zhiyuan Li,
Abstract summary: We show that a model can invoke itself to solve subtasks in isolated contexts.<n>We generalize our framework to modern agentic systems with arbitrary context processing and control flows.
Score: 28.82044197167549
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern language models reason within bounded context, an inherent constraint that poses a fundamental barrier to long-horizon reasoning. We identify recursion as a core principle for overcoming this barrier, and propose recursive models as a minimal realization, where the model can recursively invoke itself to solve subtasks in isolated contexts. We prove that any computable problem admits a recursive decomposition in which each subtask requires only exponentially smaller active context than standard autoregressive models; this strictly surpasses any context management approach confined to a single sequence, such as summarization. We further generalize our framework to modern agentic systems with arbitrary context processing and control flows, and prove that recursive models can achieve optimal power within this broader class. Experimentally, we train a 3B model to reason recursively and evaluate on Boolean satisfiability, a task requiring long-horizon combinatorial search, where it significantly outperforms frontier LLMs.

Related papers

On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks [56.98385132295952]
We evaluate how well chain-of-thought approaches generalize on a simple planning task.<n>We find that reasoning traces which combine multiple text formats yield the best (and non-trivial) OOD generalization.<n> purely text-based models consistently outperform those utilizing image-based inputs.
arXiv Detail & Related papers (2026-02-17T09:51:40Z)
Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks [1.0378456753266476]
We show that transformer architectures struggle with problems involving deeper recursion than encountered during training.<n>This limitation stems from their inability to maintain stack-like behavior.<n>We develop a novel looped locate-and-replace pipeline that decomposes problems into manageable subcomponents.
arXiv Detail & Related papers (2025-12-02T12:04:51Z)
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z)
GrootVL: Tree Topology is All You Need in State Space Model [66.36757400689281]
GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. By fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
arXiv Detail & Related papers (2024-06-04T15:09:29Z)
REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.<n>In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.<n>We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z)
A Tractable Inference Perspective of Offline RL [36.563229330549284]
A popular paradigm for offline Reinforcement Learning (RL) tasks is to first fit the offline trajectories to a sequence model, and then prompt the model for actions that lead to high expected return.<n>This paper highlights that tractability, the ability to exactly and efficiently answer various probabilistic queries, plays an important role in offline RL.<n>We propose Trifle, which bridges the gap between good sequence models and high expected returns at evaluation time.
arXiv Detail & Related papers (2023-10-31T19:16:07Z)
Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models [58.41943058963672]
We propose a new inference framework called Recursion of Thought (RoT) RoT introduces several special tokens that the models can output to trigger context-related operations. Experiments with multiple architectures including GPT-3 show that RoT dramatically improves LMs' inference capability to solve problems.
arXiv Detail & Related papers (2023-06-12T06:34:16Z)
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations [62.65877150123775]
Causal abstraction is a promising theoretical framework for explainable artificial intelligence. Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one. We present distributed alignment search (DAS), which overcomes these limitations.
arXiv Detail & Related papers (2023-03-05T00:57:49Z)
Recursive Reinforcement Learning [4.429642479975602]
Recursion is the fundamental paradigm to finitely describe potentially infinite objects. We develop RL algorithms capable of computing optimal policies in environments described as a collection of Markov decision processes.
arXiv Detail & Related papers (2022-06-23T00:29:42Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.