Related papers: Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks

Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks

URL: http://arxiv.org/abs/2512.02677v1
Date: Tue, 02 Dec 2025 12:04:51 GMT
Title: Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks
Authors: Zhiyuan He,
Abstract summary: We show that transformer architectures struggle with problems involving deeper recursion than encountered during training.<n>This limitation stems from their inability to maintain stack-like behavior.<n>We develop a novel looped locate-and-replace pipeline that decomposes problems into manageable subcomponents.
Score: 1.0378456753266476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have demonstrated remarkable capabilities across many tasks, yet face significant challenges when dealing with recursive reasoning problems, those requiring the resolution of nested hierarchical structures. While prior research has extensively studied length generalization (a model's ability to handle longer sequences than seen during training), we investigate a distinct and underexplored limitation: depth generalization. Here, depth refers to the number of nested levels in a hierarchical problem, such as the layers of parentheses in a mathematical expression or the nesting of logical clauses in a Boolean formula. Our work reveals that standard transformer architectures struggle with problems involving deeper recursion than encountered during training, even when they perform well on longer but non-nested sequences. This limitation stems from their inability to maintain stack-like behavior, the capacity to track and resolve multiple levels of nested dependencies. Through systematic analysis, we demonstrate how this architectural constraint leads to rapid performance decay as the depth of the recursion increases. To address this challenge, we develop a novel looped locate-and-replace pipeline that decomposes recursive problems into manageable subcomponents. The approach employs two specialized models: a locator that identifies solvable subexpressions and a replacer that evaluates these components while preserving the overall structure. We evaluated this method in three carefully designed domains: Boolean algebra, recursive arithmetic, and propositional logic, each with a controllable depth of recursion. We show that our method effectively alleviates the performance decay when tested on out-of-distribution recursion depth.

Related papers

Recursive Models for Long-Horizon Reasoning [28.82044197167549]
We show that a model can invoke itself to solve subtasks in isolated contexts.<n>We generalize our framework to modern agentic systems with arbitrary context processing and control flows.
arXiv Detail & Related papers (2026-03-02T17:37:10Z)
SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion [24.26069897783496]
SpiralFormer is a looped Transformer that executes recurrence under a multi-resolution recursion schedule.<n>We show that SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B.
arXiv Detail & Related papers (2026-02-12T08:23:21Z)
Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models [63.47909317137073]
Large Multimodal Models (LMMs) have achieved remarkable success in vision-language computation tasks.<n>But their vast parameter counts are often underutilized during both training and inference.<n>We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs.
arXiv Detail & Related papers (2026-02-09T17:58:23Z)
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer [0.8738725605667471]
Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning.<n>In standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency.<n>We investigate whether such reasoning structures emerge in Huginn-3.5B, a depth-recurrent Transformer that reuses layers at inference time without increasing parameter count.
arXiv Detail & Related papers (2025-07-02T23:35:21Z)
ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search [77.55575655986252]
ProtInvTree is a reward-guided tree-search framework for protein inverse folding.<n>It reformulates sequence generation as a deliberate, step-wise decision-making process.<n>It supports flexible test-time scaling by expanding the search depth and breadth without retraining.
arXiv Detail & Related papers (2025-06-01T09:34:20Z)
Recursive Decomposition with Dependencies for Generic Divide-and-Conquer Reasoning [48.829971427616854]
We introduce Recursive Decomposition with Dependencies (RDD), a scalable divide-and-conquer method for solving reasoning problems.<n>RDD can be directly applied to a new problem class even in the absence of any task-specific guidance.<n>We evaluate our approach on two benchmarks with six difficulty levels each and in two in-context settings.
arXiv Detail & Related papers (2025-05-05T11:24:20Z)
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion [14.739369424331478]
We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to sequence modeling problems and learned models' behavior. With our framework as a powerful conceptual tool, we identify different issues under various set-ups.
arXiv Detail & Related papers (2024-01-23T18:07:38Z)
Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models [58.41943058963672]
We propose a new inference framework called Recursion of Thought (RoT) RoT introduces several special tokens that the models can output to trigger context-related operations. Experiments with multiple architectures including GPT-3 show that RoT dramatically improves LMs' inference capability to solve problems.
arXiv Detail & Related papers (2023-06-12T06:34:16Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Neural Sculpting: Uncovering hierarchically modular task structure in neural networks through pruning and network analysis [8.080026425139708]
We show that hierarchically modular neural networks offer benefits such as learning efficiency, generalization, multi-task learning, and transfer. We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference.
arXiv Detail & Related papers (2023-05-28T15:12:32Z)
Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence. Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.