Related papers: Automated Formalization via Conceptual Retrieval-Augmented LLMs

Automated Formalization via Conceptual Retrieval-Augmented LLMs

URL: http://arxiv.org/abs/2508.06931v1
Date: Sat, 09 Aug 2025 10:54:25 GMT
Title: Automated Formalization via Conceptual Retrieval-Augmented LLMs
Authors: Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu,
Abstract summary: CRAMF is a Concept-driven Retrieval-Augmented Mathematical Formalization framework.<n>We introduce a framework for automatically constructing a concept-definition knowledge base from Mathlib4.<n>Experiments on miniF2F, ProofNet, and our newly proposed AdvancedMath benchmark show that CRAMF can be seamlessly integrated into LLM-based autoformalizers.
Score: 19.328918823576153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interactive theorem provers (ITPs) require manual formalization, which is labor-intensive and demands expert knowledge. While automated formalization offers a potential solution, it faces two major challenges: model hallucination (e.g., undefined predicates, symbol misuse, and version incompatibility) and the semantic gap caused by ambiguous or missing premises in natural language descriptions. To address these issues, we propose CRAMF, a Concept-driven Retrieval-Augmented Mathematical Formalization framework. CRAMF enhances LLM-based autoformalization by retrieving formal definitions of core mathematical concepts, providing contextual grounding during code generation. However, applying retrieval-augmented generation (RAG) in this setting is non-trivial due to the lack of structured knowledge bases, the polymorphic nature of mathematical concepts, and the high precision required in formal retrieval. We introduce a framework for automatically constructing a concept-definition knowledge base from Mathlib4, the standard mathematical library for the Lean 4 theorem prover, indexing over 26,000 formal definitions and 1,000+ core mathematical concepts. To address conceptual polymorphism, we propose contextual query augmentation with domain- and application-level signals. In addition, we design a dual-channel hybrid retrieval strategy with reranking to ensure accurate and relevant definition retrieval. Experiments on miniF2F, ProofNet, and our newly proposed AdvancedMath benchmark show that CRAMF can be seamlessly integrated into LLM-based autoformalizers, yielding consistent improvements in translation accuracy, achieving up to 62.1% and an average of 29.9% relative improvement.

Related papers

MMFormalizer: Multimodal Autoformalization in the Wild [79.24853896733154]
MMFormalizer extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains.<n>We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry.<n>Results show GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning.
arXiv Detail & Related papers (2026-01-06T13:42:51Z)
BRIDGE: Building Representations In Domain Guided Program Verification [67.36686119518441]
BRIDGE decomposes verification into three interconnected domains: Code, Specifications, and Proofs.<n>We show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods.
arXiv Detail & Related papers (2025-11-26T06:39:19Z)
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization [73.0780809974414]
We propose a Reflective Autoformalization method that integrates semantic consistency evaluation into the autoformalization process.<n>This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors.<n>Experiments show that ReForm achieves an average improvement of 22.6 percentage points over the strongest baselines.
arXiv Detail & Related papers (2025-10-28T16:22:54Z)
DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems [14.568293842955065]
DRIFT is a framework that decomposes informal mathematical statements into smaller, more tractable ''sub-components''<n>It retrieves illustrative theorems to help models use premises more effectively in formalization tasks.<n>We evaluate DRIFT across diverse benchmarks (ProofNet, ConNF, and MiniF2F-test) and find that it consistently improves premise retrieval.
arXiv Detail & Related papers (2025-10-12T21:42:04Z)
Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey [92.71325249013535]
Deliberative tree search is a cornerstone of Large Language Model (LLM) research.<n>This paper introduces a unified framework that deconstructs search algorithms into three core components.
arXiv Detail & Related papers (2025-10-11T03:29:18Z)
StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion [30.51794514312176]
We identify two key abilities for effective autoformalization.<n>We introduce ThinkingF, a data synthesis and training pipeline that improves both abilities.<n>The resulting 7B and 32B models exhibit both comprehensive formal knowledge and strong informal-to-formal reasoning.
arXiv Detail & Related papers (2025-08-06T13:28:22Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models [17.919212265668783]
We present FormalMATH, a large-scale Lean4 benchmark comprising 5,560 formally verified problems spanning from high-school Olympiad challenges to undergraduate-level theorems.<n>We introduce a novel human-in-the-loop autoformalization pipeline that integrates statement autoformalization, semantic verification, and negation-based disproof filtering strategies.<n>Our evaluation of state-of-the-art LLM-based theorem provers reveals significant limitations.
arXiv Detail & Related papers (2025-05-05T15:37:00Z)
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models [59.920971312822736]
We introduce PromptCoT, a novel approach for automatically generating high-quality Olympiad-level math problems.<n>The proposed method synthesizes complex problems based on mathematical concepts and the rationale behind problem construction.<n>Our method is evaluated on standard benchmarks including GSM8K, MATH-500, and AIME2024, where it consistently outperforms existing problem generation methods.
arXiv Detail & Related papers (2025-03-04T06:32:30Z)
Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions [8.135142928659546]
We introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv)<n>We evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL.<n>Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F.
arXiv Detail & Related papers (2025-02-17T17:34:48Z)
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency [22.86318578119266]
We introduce a novel framework that scores and selects the best result from k autoformalization candidates based on symbolic equivalence and semantic consistency.<n>Our experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy.
arXiv Detail & Related papers (2024-10-28T11:37:39Z)
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.<n>For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.<n>As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z)
Autoformalizing Euclidean Geometry [74.72212706513318]
We introduce a neuro-symbolic framework for autoformalizing Euclidean geometry. One challenge is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. We provide automatic semantic evaluation for autoformalized theorem statements.
arXiv Detail & Related papers (2024-05-27T14:35:10Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.