Towards Autonomous Mathematics Research
- URL: http://arxiv.org/abs/2602.10177v2
- Date: Thu, 12 Feb 2026 18:27:29 GMT
- Title: Towards Autonomous Mathematics Research
- Authors: Tony Feng, Trieu H. Trinh, Garrett Bingham, Dawsen Hwang, Yuri Chervonyi, Junehyuk Jung, Joonkyung Lee, Carlo Pagano, Sang-hyun Kim, Federico Pasqualotto, Sergei Gukov, Jonathan N. Lee, Junsu Kim, Kaiying Hou, Golnaz Ghiasi, Yi Tay, YaGuang Li, Chenkai Kuang, Yuan Liu, Hanzhao Lin, Evan Zheran Liu, Nigamaa Nayakanti, Xiaomeng Yang, Heng-Tze Cheng, Demis Hassabis, Koray Kavukcuoglu, Quoc V. Le, Thang Luong,
- Abstract summary: We introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language.<n>Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems.<n>We demonstrate Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research.
- Score: 48.29504087871558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in foundational models have yielded reasoning systems capable of achieving a gold-medal standard at the International Mathematical Olympiad. The transition from competition-level problem-solving to professional research, however, requires navigating vast literature and constructing long-horizon proofs. In this work, we introduce Aletheia, a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. Specifically, Aletheia is powered by an advanced version of Gemini Deep Think for challenging reasoning problems, a novel inference-time scaling law that extends beyond Olympiad-level problems, and intensive tool use to navigate the complexities of mathematical research. We demonstrate the capability of Aletheia from Olympiad problems to PhD-level exercises and most notably, through several distinct milestones in AI-assisted mathematics research: (a) a research paper (Feng26) generated by AI without any human intervention in calculating certain structure constants in arithmetic geometry called eigenweights; (b) a research paper (LeeSeo26) demonstrating human-AI collaboration in proving bounds on systems of interacting particles called independent sets; and (c) an extensive semi-autonomous evaluation (Feng et al., 2026a) of 700 open problems on Bloom's Erdos Conjectures database, including autonomous solutions to four open questions. In order to help the public better understand the developments pertaining to AI and mathematics, we suggest quantifying standard levels of autonomy and novelty of AI-assisted results, as well as propose a novel concept of human-AI interaction cards for transparency. We conclude with reflections on human-AI collaboration in mathematics and share all prompts as well as model outputs at https://github.com/google-deepmind/superhuman/tree/main/aletheia.
Related papers
- The AI Research Assistant: Promise, Peril, and a Proof of Concept [0.0]
We provide empirical evidence through a detailed case study.<n>The collaboration revealed both remarkable capabilities and critical limitations.<n>Our experience suggests that, when used with appropriate skepticism and verification protocols, AI tools can meaningfully accelerate mathematical discovery.
arXiv Detail & Related papers (2026-02-26T10:29:05Z) - Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z) - Vibe Reasoning: Eliciting Frontier AI Mathematical Capabilities -- A Case Study on IMO 2025 Problem 6 [28.84243696489176]
We introduce Vibe Reasoning, a human-AI collaborative paradigm for solving complex mathematical problems.<n>We demonstrate this paradigm through IMO 2025 Problem 6, a optimization problem where autonomous AI systems publicly reported failures.
arXiv Detail & Related papers (2025-12-22T11:30:19Z) - The Mathematician's Assistant: Integrating AI into Research Practice [0.0]
This paper explores the current landscape of publicly accessible large language models (LLMs) in a mathematical research context.<n>We propose a framework for integrating AI into the research workflow, centered on the principle of the augmented mathematician.<n>We conclude that the primary role of AI is currently augmentation rather than automation.
arXiv Detail & Related papers (2025-08-27T19:33:48Z) - PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models [59.920971312822736]
We introduce PromptCoT, a novel approach for automatically generating high-quality Olympiad-level math problems.<n>The proposed method synthesizes complex problems based on mathematical concepts and the rationale behind problem construction.<n>Our method is evaluated on standard benchmarks including GSM8K, MATH-500, and AIME2024, where it consistently outperforms existing problem generation methods.
arXiv Detail & Related papers (2025-03-04T06:32:30Z) - Mathematics and Machine Creativity: A Survey on Bridging Mathematics with AI [14.825293189738849]
This paper presents a comprehensive overview on the applications of artificial intelligence (AI) in mathematical research.<n>Recent developments in AI, particularly in reinforcement learning (RL) and large language models (LLMs), have demonstrated the potential for AI to contribute back to mathematics.<n>This survey aims to establish a bridge between AI and mathematics, providing insights into the mutual benefits and fostering deeper interdisciplinary understanding.
arXiv Detail & Related papers (2024-12-21T08:58:36Z) - Formal Mathematical Reasoning: A New Frontier in AI [60.26950681543385]
We advocate for formal mathematical reasoning and argue that it is indispensable for advancing AI4Math to the next level.<n>We summarize existing progress, discuss open challenges, and envision critical milestones to measure future success.
arXiv Detail & Related papers (2024-12-20T17:19:24Z) - Proving Olympiad Algebraic Inequalities without Human Demonstrations [3.3466865213133836]
We propose AIPS, an Algebraic Inequality Proving System capable of autonomously generating complex inequality theorems.
On a test set of 20 Olympiad-level inequality problems, AIPS successfully solved 10, outperforming state-of-the-art methods.
One theorem was selected as a competition problem in a major city 2024 Mathematical Olympiad.
arXiv Detail & Related papers (2024-06-20T11:37:53Z) - OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI [73.75520820608232]
We introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities.<n>These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage.<n>Our evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration.
arXiv Detail & Related papers (2024-06-18T16:20:53Z) - AI for Mathematics: A Cognitive Science Perspective [86.02346372284292]
Mathematics is one of the most powerful conceptual systems developed and used by the human species.
Rapid progress in AI, particularly propelled by advances in large language models (LLMs), has sparked renewed, widespread interest in building such systems.
arXiv Detail & Related papers (2023-10-19T02:00:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.