Related papers: SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

URL: http://arxiv.org/abs/2603.03002v1
Date: Tue, 03 Mar 2026 13:52:40 GMT
Title: SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models
Authors: Peiyao Jiang, Zequn Qin, Xi Li,
Abstract summary: Genuine spatial reasoning relies on the capacity to construct and manipulate coherent internal spatial representations.<n>Existing benchmarks fail to isolate this intrinsic spatial cognition from statistical languages.<n>We introduce SpatialText, a theory-driven diagnostic framework.
Score: 12.26174714418171
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Genuine spatial reasoning relies on the capacity to construct and manipulate coherent internal spatial representations, often conceptualized as mental models, rather than merely processing surface linguistic associations. While large language models exhibit advanced capabilities across various domains, existing benchmarks fail to isolate this intrinsic spatial cognition from statistical language heuristics. Furthermore, multimodal evaluations frequently conflate genuine spatial reasoning with visual perception. To systematically investigate whether models construct flexible spatial mental models, we introduce SpatialText, a theory-driven diagnostic framework. Rather than functioning simply as a dataset, SpatialText isolates text-based spatial reasoning through a dual-source methodology. It integrates human-annotated descriptions of real 3D indoor environments, which capture natural ambiguities, perspective shifts, and functional relations, with code-generated, logically precise scenes designed to probe formal spatial deduction and epistemic boundaries. Systematic evaluation across state-of-the-art models reveals fundamental representational limitations. Although models demonstrate proficiency in retrieving explicit spatial facts and operating within global, allocentric coordinate systems, they exhibit critical failures in egocentric perspective transformation and local reference frame reasoning. These systematic errors provide strong evidence that current models rely heavily on linguistic co-occurrence heuristics rather than constructing coherent, verifiable internal spatial representations. SpatialText thus serves as a rigorous instrument for diagnosing the cognitive boundaries of artificial spatial intelligence.

Related papers

Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation [52.605647992080485]
spatial reasoning advances vision-language models from visual perception toward semantic understanding.<n>We integrate the cognitive concept of an object-centric blueprint into spatial reasoning.<n>Our method consistently outperforms existing vision-language models.
arXiv Detail & Related papers (2026-01-05T10:38:26Z)
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery [64.67498968405327]
SpatialDreamer is a reinforcement learning framework that enables spatial reasoning through a closedloop process of active exploration.<n>GeoPO introduces tree-structured sampling and step-level reward estimation with consistency geometric constraints.
arXiv Detail & Related papers (2025-12-08T17:20:50Z)
Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models [23.12717700882611]
spatial reasoning is a fundamental component of human cognition.<n>Current large language models (LLMs) and vision language models (VLMs) have demonstrated remarkable reasoning capabilities across logical inference, problem solving, and decision making.<n>We hypothesize that imagination, the internal simulation of spatial states, is the dominant reasoning mechanism within a spatial world model.
arXiv Detail & Related papers (2025-11-16T03:09:55Z)
LTD-Bench: Evaluating Large Language Models by Letting Them Draw [57.237152905238084]
LTD-Bench is a breakthrough benchmark for large language models (LLMs)<n>It transforms LLM evaluation from abstract scores to directly observable visual outputs by requiring models to generate drawings through dot matrices or executable code.<n> LTD-Bench's visual outputs enable powerful diagnostic analysis, offering a potential approach to investigate model similarity.
arXiv Detail & Related papers (2025-11-04T08:11:23Z)
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing [62.447497430479174]
Drawing to reason in space is a novel paradigm that enables LVLMs to reason through elementary drawing operations in the visual space.<n>Our model, named VILASR, consistently outperforms existing methods across diverse spatial reasoning benchmarks.
arXiv Detail & Related papers (2025-06-11T17:41:50Z)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models [14.442394137843923]
We present a detailed analysis that first delineates the core elements of spatial reasoning.<n>We then assesses the performance of these models in both synthetic and real-world images.
arXiv Detail & Related papers (2025-03-25T14:34:06Z)
SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation [7.659514491338669]
Current vision-language models may grasp basic spatial cues but struggle with the multi-dimensional spatial reasoning necessary for human-like understanding and real-world applications.<n>We develop SPHERE, a hierarchical evaluation framework supported by a new human-annotated dataset.<n> Benchmark evaluation of state-of-the-art models reveals significant deficiencies, especially in reasoning about distance and proximity.
arXiv Detail & Related papers (2024-12-17T09:10:55Z)
Neuro-symbolic Training for Reasoning over Spatial Language [17.901249830817882]
Even state-of-the-art language models struggle with spatial reasoning over text.<n>This is attributed to not achieving the right level of abstraction required for generalizability.<n>We propose training language models with neuro-symbolic techniques that exploit the spatial logical rules as constraints.
arXiv Detail & Related papers (2024-06-19T20:47:36Z)
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning [4.422649561583363]
We present a novel benchmark for assessing spatial reasoning in language models (LMs) It is grounded in realistic 3D simulation data, offering a series of diverse room layouts with various objects and their spatial relationships. A key contribution is our logic-based consistency-checking tool, which enables the assessment of multiple plausible solutions.
arXiv Detail & Related papers (2024-05-23T21:22:00Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
From Spatial Relations to Spatial Configurations [64.21025426604274]
spatial relation language is able to represent a large, comprehensive set of spatial concepts crucial for reasoning. We show how we extend the capabilities of existing spatial representation languages with the fine-grained decomposition of semantics.
arXiv Detail & Related papers (2020-07-19T02:11:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.