NoReGeo: Non-Reasoning Geometry Benchmark
- URL: http://arxiv.org/abs/2601.10254v1
- Date: Thu, 15 Jan 2026 10:22:55 GMT
- Title: NoReGeo: Non-Reasoning Geometry Benchmark
- Authors: Irina Abdullaeva, Anton Vasiliuk, Elizaveta Goncharova, Temurbek Rahmatullaev, Zagorulko Ivan, Maxim Kurkin, Andrey Kuznetsov,
- Abstract summary: NoReGeo is a novel benchmark designed to evaluate the intrinsic geometric understanding of large language models (LLMs)<n>Our benchmark comprises 2,500 trivial geometric problems spanning 25 categories, each carefully crafted to be solvable purely through native geometric understanding.<n>We assess a range of state-of-the-art models on NoReGeo, including frontier models like GPT-4, observing that even the most advanced systems achieve an overall maximum of 65% accuracy in binary classification tasks.
- Score: 5.288175082601994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present NoReGeo, a novel benchmark designed to evaluate the intrinsic geometric understanding of large language models (LLMs) without relying on reasoning or algebraic computation. Unlike existing benchmarks that primarily assess models' proficiency in reasoning-based geometry-where solutions are derived using algebraic methods-NoReGeo focuses on evaluating whether LLMs can inherently encode spatial relationships and recognize geometric properties directly. Our benchmark comprises 2,500 trivial geometric problems spanning 25 categories, each carefully crafted to be solvable purely through native geometric understanding, assuming known object locations. We assess a range of state-of-the-art models on NoReGeo, including frontier models like GPT-4, observing that even the most advanced systems achieve an overall maximum of 65% accuracy in binary classification tasks. Further, our ablation experiments demonstrate that such geometric understanding does not emerge through fine-tuning alone, indicating that effective training for geometric comprehension requires a specialized approach from the outset. Our findings highlight a significant gap in current LLMs' ability to natively grasp geometric concepts, providing a foundation for future research toward models with true geometric cognition.
Related papers
- GeoRef: Referring Expressions in Geometry via Task Formulation, Synthetic Supervision, and Reinforced MLLM-based Solutions [45.70578816057097]
We introduce the task of Referring Expression (REC) for geometric problems.<n>REC evaluates whether models can localize points, shapes, and spatial relations in diagrams in response to textual prompts.<n>We generate a large-scale synthetic training dataset using a structured geometric formal language.
arXiv Detail & Related papers (2025-09-25T12:00:52Z) - GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization [63.107398132743825]
Group Contrastive Policy Optimization (GCPO) is a novel reinforcement learning framework featuring two key innovations.<n>We develop GeometryZero, a family of affordable-size geometric reasoning models that judiciously determine when to employ auxiliary construction.
arXiv Detail & Related papers (2025-06-08T14:18:15Z) - GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs [7.605833826892782]
We present a benchmark of 500 carefully refined problems organized by a tailored three-level taxonomy that considers geometric complexity rather than traditional mathematical reasoning complexity.<n>Our comprehensive evaluation of 17 frontier LLMs reveals consistent and pronounced deficiencies.<n>These results highlight the unique challenges posed by program-driven spatial reasoning and establish GeoGramBench as a valuable resource for advancing research in symbolic-to-spatial geometric reasoning.
arXiv Detail & Related papers (2025-05-23T09:17:07Z) - GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning [20.399408869403437]
Geometry problem-solving (GPS) is a challenging task requiring both visual comprehension and symbolic reasoning.<n>Existing benchmarks fail to jointly assess both dimensions of the human-like geometric reasoning mechanism in large language models.<n>We introduce GeoSense, the first comprehensive bilingual benchmark designed to evaluate the geometric reasoning abilities of MLLMs.
arXiv Detail & Related papers (2025-04-17T02:46:27Z) - Do Large Language Models Truly Understand Geometric Structures? [15.915781154075615]
We introduce the GeomRel dataset to evaluate large language models' understanding of geometric structures.<n>We propose the Geometry Chain-of-Thought (GeoCoT) method, which enhances LLMs' ability to identify geometric relationships.
arXiv Detail & Related papers (2025-01-23T15:52:34Z) - A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications [71.809127869349]
This paper formalizes geometric graph as the data structure, on top of which we provide a unified view of existing models from the geometric message passing perspective.<n>We also summarize the applications as well as the related datasets to facilitate later research for methodology development and experimental evaluation.
arXiv Detail & Related papers (2024-03-01T12:13:04Z) - Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images [56.86175251327466]
We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context.
Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints.
Our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images.
arXiv Detail & Related papers (2024-02-08T17:57:59Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [121.07873620883322]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.<n>G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z) - GeoQA: A Geometric Question Answering Benchmark Towards Multimodal
Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge.
We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.