Translating Informal Proofs into Formal Proofs Using a Chain of States
- URL: http://arxiv.org/abs/2512.10317v2
- Date: Fri, 12 Dec 2025 08:11:30 GMT
- Title: Translating Informal Proofs into Formal Proofs Using a Chain of States
- Authors: Ziyu Wang, Bowen Yang, Chenyi Li, Yuan Zhang, Shihao Zhou, Bin Dong, Zaiwen Wen,
- Abstract summary: We address the problem of translating informal mathematical proofs expressed in natural language into formal proofs in Lean4 under a constrained computational budget.<n>We first extract a Chain of States (CoS), a sequence of intermediate formal proof states aligned with the logical structure of the informal argument.<n>We then generate tactics to transition between adjacent states in the CoS, thereby constructing the full formal proof.
- Score: 20.0011959667642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of translating informal mathematical proofs expressed in natural language into formal proofs in Lean4 under a constrained computational budget. Our approach is grounded in two key insights. First, informal proofs tend to proceed via a sequence of logical transitions - often implications or equivalences - without explicitly specifying intermediate results or auxiliary lemmas. In contrast, formal systems like Lean require an explicit representation of each proof state and the tactics that connect them. Second, each informal reasoning step can be viewed as an abstract transformation between proof states, but identifying the corresponding formal tactics often requires nontrivial domain knowledge and precise control over proof context. To bridge this gap, we propose a two stage framework. Rather than generating formal tactics directly, we first extract a Chain of States (CoS), a sequence of intermediate formal proof states aligned with the logical structure of the informal argument. We then generate tactics to transition between adjacent states in the CoS, thereby constructing the full formal proof. This intermediate representation significantly reduces the complexity of tactic generation and improves alignment with informal reasoning patterns. We build dedicated datasets and benchmarks for training and evaluation, and introduce an interactive framework to support tactic generation from formal states. Empirical results show that our method substantially outperforms existing baselines, achieving higher proof success rates.
Related papers
- Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification [49.506412445511934]
Large Language Models (LLMs) show remarkable capabilities, yet their next-token prediction creates logical inconsistencies and reward hacking.<n>We introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process.<n>We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization.
arXiv Detail & Related papers (2026-01-30T07:01:25Z) - BRIDGE: Building Representations In Domain Guided Program Verification [67.36686119518441]
BRIDGE decomposes verification into three interconnected domains: Code, Specifications, and Proofs.<n>We show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods.
arXiv Detail & Related papers (2025-11-26T06:39:19Z) - Are Language Models Efficient Reasoners? A Perspective from Logic Programming [109.47572890883248]
Modern language models (LMs) exhibit strong deductive reasoning capabilities, yet standard evaluations emphasize correctness while overlooking a key aspect of human-like reasoning: efficiency.<n>We propose a framework for assessing LM reasoning efficiency through the lens of logic programming.
arXiv Detail & Related papers (2025-10-29T15:30:31Z) - ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization [10.021930888250546]
Current approaches fail to preserve semantic meaning and logical structure of the original human-written argument.<n>We introduce ProofFlow, a novel pipeline that treats structural fidelity as a primary objective.<n>We present a new benchmark of 184 undergraduate-level problems, manually annotated with step-by-step solutions.<n>We also introduce ProofScore, a new composite metric to evaluate syntactic correctness, semantic faithfulness, and structural fidelity.
arXiv Detail & Related papers (2025-10-13T10:20:11Z) - Typed Chain-of-Thought: A Curry-Howard Framework for Verifying LLM Reasoning [0.0]
Chain-of-Thought (CoT) prompting enhances the reasoning capabilities of large language models.<n>We propose a novel theoretical lens for this problem grounded in the Curry-Howard correspondence.<n>We operationalise this analogy, presenting methods to extract and map the informal, natural language steps of CoT into a formal, typed proof structure.
arXiv Detail & Related papers (2025-10-01T16:06:40Z) - StepProof: Step-by-step verification of natural language mathematical proofs [16.150265021594088]
We propose StepProof, a novel autoformalization method for granular, step-by-step verification.<n>StepProof breaks down complete proofs into multiple verifiable subproofs, enabling sentence-level verification.<n>We show that StepProof significantly improves proof success rates and efficiency compared to traditional methods.
arXiv Detail & Related papers (2025-06-12T10:31:23Z) - Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification [56.218970738892764]
Chain-of-Thought prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs)<n>To mitigate hallucinations in CoT that are notoriously difficult to detect, current methods operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness.<n>We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations.
arXiv Detail & Related papers (2025-06-05T03:16:08Z) - Steering LLMs for Formal Theorem Proving [1.083373217040891]
We develop an inference-time intervention that identifies linear directions in residual activations associated with informal reasoning traces.<n>This mechanism also yields interpretable information about how reasoning is internally encoded in the activation space of Large Language Models.<n>Our contributions are twofold: (1) a novel activation-based intervention for guiding proof synthesis in LLMs; and (2) demonstration that this intervention improves performance under two decoding strategies.
arXiv Detail & Related papers (2025-02-21T15:04:48Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - Formal Proofs as Structured Explanations: Proposing Several Tasks on Explainable Natural Language Inference [0.16317061277457]
We propose a reasoning framework that can model the reasoning process underlying natural language inferences.<n>The framework is based on the semantic tableau method, a well-studied proof system in formal logic.<n>We show how it can be used to define natural language reasoning tasks with structured explanations.
arXiv Detail & Related papers (2023-11-15T01:24:09Z) - Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search)
NLProofS learns to generate relevant steps conditioning on the hypothesis.
It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.