From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
- URL: http://arxiv.org/abs/2512.24532v1
- Date: Wed, 31 Dec 2025 00:36:03 GMT
- Title: From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
- Authors: Amir Tahmasbi, Sadegh Majidi, Kazem Taram, Aniket Bera,
- Abstract summary: We propose a two-stage approach that decomposes spatial reasoning into atomic building blocks and their composition.<n>First, we apply supervised fine-tuning on elementary spatial transformations, such as rotation, translation, and scaling, to equip the model with basic spatial physics.<n>We then freeze this physics-aware model and train lightweight LoRA adapters within the GRPO framework to learn policies that compose these building blocks for multi-step planning.
- Score: 10.98910502098502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial reasoning in large language models (LLMs) has gained increasing attention due to applications in navigation and planning. Despite strong general language capabilities, LLMs still struggle with spatial transformations and multi-step planning in structured environments. We propose a two-stage approach that decomposes spatial reasoning into atomic building blocks and their composition. First, we apply supervised fine-tuning on elementary spatial transformations, such as rotation, translation, and scaling, to equip the model with basic spatial physics. We then freeze this physics-aware model and train lightweight LoRA adapters within the GRPO framework to learn policies that compose these building blocks for multi-step planning in puzzle-based environments, in a closed-loop manner. To support this pipeline, we synthesize an ASCII-art dataset and construct a corresponding ASCII-based reinforcement learning environment. Our method consistently outperforms baselines, including the generic backbone, physics-aware model, and end-to-end RL models, under both Dynamic environments with explicit state updates and Static environments where the model must rely on its internal state across steps. In addition, the proposed approach converges faster and exhibits more stable training compared to end-to-end reinforcement learning from scratch. Finally, we analyze attention patterns to assess whether fine-tuning induces meaningful improvements in spatial understanding.
Related papers
- TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation [70.23578202012048]
Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch.<n>We propose TagaVLM (Topology-Aware Global Action reasoning), an end-to-end framework that explicitly injects topological structures into the VLM backbone.<n>To enhance topological node information, an Interleaved Navigation Prompt strengthens node-level visual-text alignment.<n>With the embedded topological graph, the model is capable of global action reasoning, allowing for robust path correction.
arXiv Detail & Related papers (2026-03-03T13:28:07Z) - Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants [85.33837131101342]
We propose a strategic roadmap organized into four pillars: foundational infrastructure, algorithmic optimization, cognitive reasoning, and unified multimodal intelligence.<n>We argue that this transition is essential for developing next-generation AI capable of complex structural reasoning, dynamic self-correction, and seamless multimodal integration.
arXiv Detail & Related papers (2026-01-20T14:58:23Z) - Natural Building Blocks for Structured World Models: Theory, Evidence, and Scaling [42.78591555984395]
We propose a framework that specifies the natural building blocks for structured world models.<n>We examine Hidden Markov Models (HMMs) and linear switching dynamical systems (sLDS) as natural building blocks for discrete and continuous modeling.<n>This modular approach supports both passive modeling (generation, forecasting) and active control (planning, decision-making) within the same architecture.
arXiv Detail & Related papers (2025-11-03T22:02:04Z) - Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning [58.533203990515034]
Scaling neural networks has driven breakthrough advances in machine learning, yet this paradigm fails in deep reinforcement learning (DRL)<n>We show that dynamic sparse training strategies provide module-specific benefits that complement the primary scalability foundation established by architectural improvements.<n>We finally distill these insights into Module-Specific Training (MST), a practical framework that exploits the benefits of architectural improvements and demonstrates substantial scalability gains across diverse RL algorithms without algorithmic modifications.
arXiv Detail & Related papers (2025-10-14T03:03:08Z) - ST-LINK: Spatially-Aware Large Language Models for Spatio-Temporal Forecasting [7.853736939635847]
We introduce ST-LINK, a novel framework that enhances the capability of Large Language Models to capture sequential-temporal dependencies.<n>Its key components are spatially-Enhanced Attention (SE-Attention) and the Memory Retrieval Feed-Forward Network (MRFFN)
arXiv Detail & Related papers (2025-09-17T07:11:45Z) - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [103.32591749156416]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - A Fuzzy Logic Prompting Framework for Large Language Models in Adaptive and Uncertain Tasks [2.1756081703276]
We introduce a modular prompting framework that supports safer and more adaptive use of large language models (LLMs) across dynamic, user-centered tasks.<n>Our method combines a natural language boundary prompt with a control schema encoded with fuzzy scaffolding logic and adaptation rules.<n>In a simulated intelligent tutoring setting, the framework improves scaffolding quality, adaptivity, and instructional alignment across multiple models, outperforming standard prompting baselines.
arXiv Detail & Related papers (2025-08-08T23:50:48Z) - Penrose Tiled Low-Rank Compression and Section-Wise Q&A Fine-Tuning: A General Framework for Domain-Specific Large Language Model Adaptation [7.161207910629032]
Large language models (LLMs) hold great promise for specialized scientific domains such as materials science.<n>We propose a two-stage framework that combines structured model compression with a scientific fine-tuning regimen to address this challenge.
arXiv Detail & Related papers (2025-03-28T01:33:05Z) - LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.<n>Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.