Related papers: LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs

LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs

URL: http://arxiv.org/abs/2506.08321v1
Date: Tue, 10 Jun 2025 01:12:36 GMT
Title: LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs
Authors: Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade,
Abstract summary: We present LeanTutor, a tutoring system for math proofs based on the Large Language Model (LLM)<n>LeanTutor interacts with the student in natural language, formally verifies student-written math proofs in Lean, generates correct next steps, and provides the appropriate instructional guidance.<n>To evaluate our system, we introduce PeanoBench, a human-written dataset derived from the Natural Numbers Game, consisting of 371 Peano Arithmetic proofs.
Score: 7.468772576199917
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present LeanTutor, a Large Language Model (LLM)-based tutoring system for math proofs. LeanTutor interacts with the student in natural language, formally verifies student-written math proofs in Lean, generates correct next steps, and provides the appropriate instructional guidance. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. The first module faithfully autoformalizes student proofs into Lean and verifies proof accuracy via successful code compilation. If the proof has an error, the incorrect step is identified. The next-step generator module outputs a valid next Lean tactic for incorrect proofs via LLM-based candidate generation and proof search. The feedback generator module leverages Lean data to produce a pedagogically-motivated natural language hint for the student user. To evaluate our system, we introduce PeanoBench, a human-written dataset derived from the Natural Numbers Game, consisting of 371 Peano Arithmetic proofs, where each natural language proof step is paired with the corresponding logically equivalent tactic in Lean. The Autoformalizer correctly formalizes 57% of tactics in correct proofs and accurately identifies the incorrect step in 30% of incorrect proofs. In generating natural language hints for erroneous proofs, LeanTutor outperforms a simple baseline on accuracy and relevance metrics.

Related papers

Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification [56.218970738892764]
Chain-of-Thought prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs)<n>To mitigate hallucinations in CoT that are notoriously difficult to detect, current methods operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness.<n>We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations.
arXiv Detail & Related papers (2025-06-05T03:16:08Z)
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework [2.9334627971166336]
This paper investigates the logical reasoning capabilities of large language models (LLMs)<n>A trained LLM receives as input a set of assumptions and a goal, and produces as output a proof that formally derives the goal from the assumptions.<n>A critical obstacle for training is the scarcity of real-world proofs.
arXiv Detail & Related papers (2025-04-28T19:25:29Z)
A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems [0.0]
We write complete, original formal proofs for the remaining IMO problems in Lean.<n>This effort expands the availability of proof currently in the public domain by creating 5,880 lines of Lean proof.
arXiv Detail & Related papers (2024-11-28T02:50:42Z)
ImProver: Agent-Based Automated Proof Optimization [18.315243539816464]
We present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems.
arXiv Detail & Related papers (2024-10-07T05:14:18Z)
Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z)
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements. This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models. We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics [7.607254619341369]
We introduce ProofNet, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3. We report baseline results on statement autoformalization via in-context learning.
arXiv Detail & Related papers (2023-02-24T03:28:46Z)
NaturalProver: Grounded Mathematical Proof Generation with Language Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education. We develop NaturalProver, a language model that generates proofs by conditioning on background references. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z)
Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search) NLProofS learns to generate relevant steps conditioning on the hypothesis. It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.