Semantic Search over 9 Million Mathematical Theorems
- URL: http://arxiv.org/abs/2602.05216v1
- Date: Thu, 05 Feb 2026 02:16:20 GMT
- Title: Semantic Search over 9 Million Mathematical Theorems
- Authors: Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Giovanni Inchiostro, Vasily Ilin,
- Abstract summary: We introduce and study semantic theorem retrieval at scale over a unified corpus of $9.2$ million theorem statements extracted from arXiv and seven other sources.<n>We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality.
- Score: 0.31229177276953496
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of $9.2$ million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at \href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{this link}, and the dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{this link}.
Related papers
- Using Large Language Models to Study Mathematical Practice [0.0]
This paper reports the results of a corpus study facilitated by Google's Gemini 2.5 Pro.<n>Based on a sample of 5000 mathematics papers from arXiv.org, the experiments yielded a dataset of hundreds of useful annotated examples.<n>It also seeks to begin a conversation about these methods as research tools in practice-oriented philosophy.
arXiv Detail & Related papers (2025-06-16T20:22:50Z) - DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning [67.93945726549289]
DeepTheorem is a comprehensive informal theorem-proving framework exploiting natural language to enhance mathematical reasoning.<n>DeepTheorem includes a large-scale benchmark dataset consisting of 121K high-quality IMO-level informal theorems and proofs.<n>We devise a novel reinforcement learning strategy (RL-Zero) explicitly tailored to informal theorem proving, leveraging the verified theorem variants to incentivize robust mathematical inference.
arXiv Detail & Related papers (2025-05-29T17:59:39Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.<n>For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.<n>As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - Proving Theorems Recursively [80.42431358105482]
We propose POETRY, which proves theorems in a level-by-level manner.
Unlike previous step-by-step methods, POETRY searches for a sketch of the proof at each level.
We observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.
arXiv Detail & Related papers (2024-05-23T10:35:08Z) - ATG: Benchmarking Automated Theorem Generation for Generative Language Models [83.93978859348313]
Humans can develop new theorems to explore broader and more complex mathematical results.
Current generative language models (LMs) have achieved significant improvement in automatically proving theorems.
This paper proposes an Automated Theorem Generation benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems.
arXiv Detail & Related papers (2024-05-05T02:06:37Z) - REFACTOR: Learning to Extract Theorems from Proofs [29.44286369265644]
We show that REFACTOR can extract 19.6% of the theorems that humans would use to write the proofs.
With newly extracted theorems, we show that the existing MetaMath database can beed.
We also demonstrate that the prover trained on the new-theoremed dataset proves more test theorems and outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-26T21:21:30Z) - Considerations on Approaches and Metrics in Automated Theorem
Generation/Finding in Geometry [0.0]
We present and discuss different approaches for the automatic discovery of geometric theorems (and properties)
We will argue that judging whether a theorem prover is able to produce interesting theorems remains a non deterministic task.
arXiv Detail & Related papers (2024-01-22T12:51:19Z) - A New Approach Towards Autoformalization [7.275550401145199]
Autoformalization is the task of translating natural language mathematics into a formal language that can be verified by a program.
Research paper mathematics requires large amounts of background and context.
We propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks.
arXiv Detail & Related papers (2023-10-12T00:50:24Z) - TheoremQA: A Theorem-driven Question Answering dataset [100.39878559382694]
GPT-4's capabilities to solve these problems are unparalleled, achieving an accuracy of 51% with Program-of-Thoughts Prompting.
TheoremQA is curated by domain experts containing 800 high-quality questions covering 350 theorems.
arXiv Detail & Related papers (2023-05-21T17:51:35Z) - Representer Theorems for Metric and Preference Learning: Geometric Insights and Algorithms [1.370633147306388]
We develop a framework to address a broad class of metric and preference learning problems within a Hilbert space.<n>For the general task of metric learning, our framework leads to a simple and self-contained representer theorem.<n> Lastly, our representer theorem leads to a novel nonlinear algorithm for metric and preference learning.
arXiv Detail & Related papers (2023-04-07T16:34:25Z) - Learning to Prove Theorems by Learning to Generate Theorems [71.46963489866596]
We learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover.
Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover.
arXiv Detail & Related papers (2020-02-17T16:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.