Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
- URL: http://arxiv.org/abs/2512.10534v2
- Date: Fri, 12 Dec 2025 13:43:03 GMT
- Title: Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
- Authors: Haiteng Zhao, Junhao Shen, Yiming Zhang, Songyang Gao, Kuikun Liu, Tianyou Ma, Fan Zheng, Dahua Lin, Wenwei Zhang, Kai Chen,
- Abstract summary: Large language model (LLM) agents exhibit strong mathematical problem-solving abilities.<n>In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry.<n> InternGeometry overcomes the limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine.<n>Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems, exceeding the average gold medalist score (40.9), using only 13K training examples.
- Score: 66.79506488139707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of formal proof systems. However, due to weak heuristics for auxiliary constructions, AI for geometry problem solving remains dominated by expert models such as AlphaGeometry 2, which rely heavily on large-scale data synthesis and search for both training and evaluation. In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry. InternGeometry overcomes the heuristic limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on the engine's feedback to guide subsequent proposals. A dynamic memory mechanism enables InternGeometry to conduct more than two hundred interactions with the symbolic engine per problem. To further accelerate learning, we introduce Complexity-Boosting Reinforcement Learning (CBRL), which gradually increases the complexity of synthesized problems across training stages. Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems (2000-2024), exceeding the average gold medalist score (40.9), using only 13K training examples, just 0.004% of the data used by AlphaGeometry 2, demonstrating the potential of LLM agents on expert-level geometry tasks. InternGeometry can also propose novel auxiliary constructions for IMO problems that do not appear in human solutions. We will release the model, data, and symbolic engine to support future research.
Related papers
- Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions [129.877899436804]
We present a highly efficient method for geometry theorem proving that runs entirely on CPUs without relying on neural network-based inference.<n>Our initial study shows that a simple random strategy for adding auxiliary points can achieve silver-medal level human performance on International Mathematical Olympiad (IMO)<n>We further construct HAGeo-409, a benchmark consisting of 409 geometry problems with human-assessed difficulty levels.
arXiv Detail & Related papers (2025-11-27T01:05:00Z) - Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 [43.92309838336044]
We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024)<n>To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects.<n>This has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%.
arXiv Detail & Related papers (2025-02-05T19:02:03Z) - Proposing and solving olympiad geometry with guided tree search [63.824930029019995]
We introduce TongGeometry, a Euclidean geometry system supporting tree-search-based guided problem proposing and solving.<n>TongGeometry discovers 6.7 billion geometry theorems requiring auxiliary constructions, including 4.1 billion exhibiting geometric symmetry.<n>TongGeometry solved all International Mathematical Olympiad geometry in IMO-AG-30, outperforming gold medalists for the first time.
arXiv Detail & Related papers (2024-12-14T04:20:47Z) - Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning [4.4615747404424395]
Geometry mathematics problems pose significant challenges for large language models (LLMs)<n>We collect a geometry question-answer dataset by sourcing geometric data from Chinese high school education websites, referred to as GeoMath.<n>We propose a Large Multi-modal Model (LMM) framework named Geo-LLaVA, which incorporates retrieval augmentation with supervised fine-tuning (SFT) in the training stage, called meta-training, and employs in-context learning (ICL) during inference to improve performance.
arXiv Detail & Related papers (2024-12-12T07:34:09Z) - Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring [34.37450586634531]
This paper presents GPSM4K, a comprehensive geometry multimodal dataset tailored to augment the problem-solving capabilities of Large Vision Language Models (LVLMs)<n>GPSM4K encompasses 2157 multimodal question-answer pairs manually extracted from mathematics textbooks spanning grades 7-12.<n>This dataset serves as an excellent benchmark for assessing the geometric reasoning capabilities of LVLMs.
arXiv Detail & Related papers (2024-12-01T15:19:23Z) - Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver [11.69164802295844]
We introduce a new framework that integrates visual features, geometric formal language, and natural language representations.
We propose a novel synthetic data approach and create a large-scale geometric dataset, SynthGeo228K, annotated with both formal and natural language captions.
Our framework improves MLLMs' ability to process geometric diagrams and extends their application to open-ended tasks on the formalgeo7k dataset.
arXiv Detail & Related papers (2024-09-06T12:11:06Z) - G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [121.07873620883322]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities.<n>G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z) - FormalGeo: An Extensible Formalized Framework for Olympiad Geometric
Problem Solving [9.73597821684857]
This is the first paper in a series of work we have accomplished over the past three years.
In this paper, we have constructed a consistent formal plane geometry system.
This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning.
arXiv Detail & Related papers (2023-10-27T09:55:12Z) - UniGeo: Unifying Geometry Logical Reasoning via Reformulating
Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks.
We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems.
We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z) - Inter-GPS: Interpretable Geometry Problem Solving with Formal Language
and Symbolic Reasoning [123.06420835072225]
We construct a new large-scale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language.
We propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem solver (Inter-GPS)
Inter-GPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step by step.
arXiv Detail & Related papers (2021-05-10T07:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.