Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large
Language Models
- URL: http://arxiv.org/abs/2402.03877v2
- Date: Wed, 14 Feb 2024 19:33:19 GMT
- Title: Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large
Language Models
- Authors: Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski
- Abstract summary: Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored.
We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning.
Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas.
- Score: 28.819559978685806
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) demonstrate ever-increasing abilities in
mathematical and algorithmic tasks, yet their geometric reasoning skills are
underexplored. We investigate LLMs' abilities in constructive geometric
problem-solving one of the most fundamental steps in the development of human
mathematical reasoning. Our work reveals notable challenges that the
state-of-the-art LLMs face in this domain despite many successes in similar
areas. LLMs exhibit biases in target variable selection and struggle with 2D
spatial relationships, often misrepresenting and hallucinating objects and
their placements. To this end, we introduce a framework that formulates an
LLMs-based multi-agents system that enhances their existing reasoning potential
by conducting an internal dialogue. This work underscores LLMs' current
limitations in geometric reasoning and improves geometric reasoning
capabilities through self-correction, collaboration, and diverse role
specializations.
Related papers
- A Call for New Recipes to Enhance Spatial Reasoning in MLLMs [85.67171333213301]
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks.
Recent studies have exposed critical limitations in their spatial reasoning capabilities.
This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world.
arXiv Detail & Related papers (2025-04-21T11:48:39Z) - MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams [65.02628814094639]
Diagrams serve as a fundamental form of visual language, representing complex concepts and their inter-relationships through structured symbols, shapes, and spatial arrangements.
Current benchmarks conflate perceptual and reasoning tasks, making it difficult to assess whether Multimodal Large Language Models genuinely understand mathematical diagrams beyond superficial pattern recognition.
We introduce MATHGLANCE, a benchmark specifically designed to isolate and evaluate mathematical perception in MLLMs.
We construct GeoPeP, a perception-oriented dataset of 200K structured geometry image-text annotated with geometric primitives and precise spatial relationships.
arXiv Detail & Related papers (2025-03-26T17:30:41Z) - Do Large Language Models Truly Understand Geometric Structures? [15.915781154075615]
We introduce the GeomRel dataset to evaluate large language models' understanding of geometric structures.
We propose the Geometry Chain-of-Thought (GeoCoT) method, which enhances LLMs' ability to identify geometric relationships.
arXiv Detail & Related papers (2025-01-23T15:52:34Z) - A Survey on Large Language Models with some Insights on their Capabilities and Limitations [0.3222802562733786]
Large Language Models (LLMs) exhibit remarkable performance across various language-related tasks.
LLMs have demonstrated emergent abilities extending beyond their core functions.
This paper explores the foundational components, scaling mechanisms, and architectural strategies that drive these capabilities.
arXiv Detail & Related papers (2025-01-03T21:04:49Z) - GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models [34.647839550142834]
We introduce GePBench, a novel benchmark designed to assess the geometric perception abilities of MLLMs.
Our evaluations reveal that current state-of-the-art MLLMs exhibit significant deficiencies in geometric perception tasks.
We show that models trained with GePBench data demonstrate substantial improvements on a wide range of benchmark tasks.
arXiv Detail & Related papers (2024-12-30T16:01:43Z) - Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning [15.918115880403152]
We design the Thought Space Explorer (TSE) to expand and optimize thought structures to guide large language models (LLMs) to explore their blind spots of thinking.
By generating new reasoning steps and branches based on the original thought structure with various designed strategies, TSE broadens the thought space and alleviates the impact of blind spots for LLM reasoning.
arXiv Detail & Related papers (2024-10-31T17:12:14Z) - Navigate Complex Physical Worlds via Geometrically Constrained LLM [10.89488333922071]
The study introduces a set of geometric conventions and develops a workflow based on multi-layer graphs and multi-agent system frameworks.
The study employs a genetic algorithm, inspired by large-scale model knowledge, to solve geometric constraint problems.
arXiv Detail & Related papers (2024-10-23T03:14:07Z) - Reasoning in Large Language Models: A Geometric Perspective [4.2909314120969855]
We explore the reasoning abilities of large language models (LLMs) through their geometrical understanding.
We establish a connection between the expressive power of LLMs and the density of their self-attention graphs.
arXiv Detail & Related papers (2024-07-02T21:39:53Z) - MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions [58.57255822646756]
This paper introduces MathChat, a benchmark designed to evaluate large language models (LLMs) across a broader spectrum of mathematical tasks.
We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios.
We develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations.
arXiv Detail & Related papers (2024-05-29T18:45:55Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.
G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z) - MacGyver: Are Large Language Models Creative Problem Solvers? [87.70522322728581]
We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting.
We create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems.
We present our collection to both LLMs and humans to compare and contrast their problem-solving abilities.
arXiv Detail & Related papers (2023-11-16T08:52:27Z) - Evaluating Spatial Understanding of Large Language Models [26.436450329727645]
Large language models show remarkable capabilities across a variety of tasks.
Recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts.
We design natural-language navigation tasks and evaluate the ability of LLMs to represent and reason about spatial structures.
arXiv Detail & Related papers (2023-10-23T03:44:40Z) - Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents.
Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes.
We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.