Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry
- URL: http://arxiv.org/abs/2404.06405v2
- Date: Thu, 11 Apr 2024 14:37:29 GMT
- Title: Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry
- Authors: Shiven Sinha, Ameya Prabhu, Ponnurangam Kumaraguru, Siddharth Bhat, Matthias Bethge,
- Abstract summary: We revisit the IMO-AG-30 Challenge introduced with AlphaGeometry and find that Wu's method is surprisingly strong.
Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods.
We set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.
- Score: 16.41436428888792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist.
Related papers
- Towards Autonomous Mathematics Research [48.29504087871558]
We introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language.<n>Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems.<n>We demonstrate Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research.
arXiv Detail & Related papers (2026-02-10T18:50:15Z) - Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning [66.79506488139707]
Large language model (LLM) agents exhibit strong mathematical problem-solving abilities.<n>In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry.<n> InternGeometry overcomes the limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine.<n>Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems, exceeding the average gold medalist score (40.9), using only 13K training examples.
arXiv Detail & Related papers (2025-12-11T11:05:04Z) - Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions [129.877899436804]
We present a highly efficient method for geometry theorem proving that runs entirely on CPUs without relying on neural network-based inference.<n>Our initial study shows that a simple random strategy for adding auxiliary points can achieve silver-medal level human performance on International Mathematical Olympiad (IMO)<n>We further construct HAGeo-409, a benchmark consisting of 409 geometry problems with human-assessed difficulty levels.
arXiv Detail & Related papers (2025-11-27T01:05:00Z) - OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization [88.76091817642963]
Recent large-scale language models (LLMs) with long Chain-of-such reasoning as DeepSeek-R1-have achieved impressive results on Olympiad-level mathematics.<n>We introduce OMEGA-Out-of-distribution Math Problems Evaluation with 3 Generalization Axes-a benchmark designed to evaluate three axes of out-of-distribution generalization.
arXiv Detail & Related papers (2025-06-23T17:51:40Z) - PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models [59.920971312822736]
We introduce PromptCoT, a novel approach for automatically generating high-quality Olympiad-level math problems.
The proposed method synthesizes complex problems based on mathematical concepts and the rationale behind problem construction.
Our method is evaluated on standard benchmarks including GSM8K, MATH-500, and AIME2024, where it consistently outperforms existing problem generation methods.
arXiv Detail & Related papers (2025-03-04T06:32:30Z) - Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 [43.92309838336044]
We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024)
To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects.
This has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%.
arXiv Detail & Related papers (2025-02-05T19:02:03Z) - Proposing and solving olympiad geometry with guided tree search [63.824930029019995]
We introduce TongGeometry, a Euclidean geometry system supporting tree-search-based guided problem proposing and solving.
TongGeometry discovers 6.7 billion geometry theorems requiring auxiliary constructions, including 4.1 billion exhibiting geometric symmetry.
TongGeometry solved all International Mathematical Olympiad geometry in IMO-AG-30, outperforming gold medalists for the first time.
arXiv Detail & Related papers (2024-12-14T04:20:47Z) - Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models [63.31878920079154]
We propose a benchmark specifically designed to assess large language models' mathematical reasoning at the Olympiad level.
Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics.
Our experimental results show that even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems.
arXiv Detail & Related papers (2024-10-10T14:39:33Z) - Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram [78.79651421493058]
We propose a neural-symbolic model for plane geometry problem solving (PGPS) with three key steps: modal fusion, reasoning process and knowledge verification.
For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively.
We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge solvers.
arXiv Detail & Related papers (2024-07-10T02:45:22Z) - Proving Olympiad Algebraic Inequalities without Human Demonstrations [3.3466865213133836]
We propose AIPS, an Algebraic Inequality Proving System capable of autonomously generating complex inequality theorems.
On a test set of 20 Olympiad-level inequality problems, AIPS successfully solved 10, outperforming state-of-the-art methods.
One theorem was selected as a competition problem in a major city 2024 Mathematical Olympiad.
arXiv Detail & Related papers (2024-06-20T11:37:53Z) - Can Language Models Solve Olympiad Programming? [40.54366634332231]
This paper introduces the USACO benchmark with 307 problems from the USA Computing Olympiad.
We construct and test a range of LM inference methods for competitive programming for the first time.
We find GPT-4 only achieves a 8.7% pass@1 accuracy with zero-shot chain-of-thought prompting.
arXiv Detail & Related papers (2024-04-16T23:27:38Z) - FormalGeo: An Extensible Formalized Framework for Olympiad Geometric
Problem Solving [9.73597821684857]
This is the first paper in a series of work we have accomplished over the past three years.
In this paper, we have constructed a consistent formal plane geometry system.
This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning.
arXiv Detail & Related papers (2023-10-27T09:55:12Z) - TheoremQA: A Theorem-driven Question Answering dataset [100.39878559382694]
GPT-4's capabilities to solve these problems are unparalleled, achieving an accuracy of 51% with Program-of-Thoughts Prompting.
TheoremQA is curated by domain experts containing 800 high-quality questions covering 350 theorems.
arXiv Detail & Related papers (2023-05-21T17:51:35Z) - UniGeo: Unifying Geometry Logical Reasoning via Reformulating
Mathematical Expression [127.68780714438103]
Two main geometry problems: calculation and proving, are usually treated as two specific tasks.
We construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems.
We also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously.
arXiv Detail & Related papers (2022-12-06T04:37:51Z) - GeoQA: A Geometric Question Answering Benchmark Towards Multimodal
Numerical Reasoning [172.36214872466707]
We focus on solving geometric problems, which requires a comprehensive understanding of textual descriptions, visual diagrams, and theorem knowledge.
We propose a Geometric Question Answering dataset GeoQA, containing 5,010 geometric problems with corresponding annotated programs.
arXiv Detail & Related papers (2021-05-30T12:34:17Z) - Inter-GPS: Interpretable Geometry Problem Solving with Formal Language
and Symbolic Reasoning [123.06420835072225]
We construct a new large-scale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language.
We propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem solver (Inter-GPS)
Inter-GPS incorporates theorem knowledge as conditional rules and performs symbolic reasoning step by step.
arXiv Detail & Related papers (2021-05-10T07:46:55Z) - Learning Equational Theorem Proving [2.8784416089703955]
We develop Stratified Shortest Solution Imitation Learning (3SIL) to learn equational theorem proving in a deep reinforcement learning setting.
The self-trained models achieve state-of-the-art performance in proving problems generated by one of the top open conjectures in quasigroup theory.
arXiv Detail & Related papers (2021-02-10T16:33:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.