Related papers: Towards a Mathematics Formalisation Assistant using Large Language Models

Towards a Mathematics Formalisation Assistant using Large Language Models

URL: http://arxiv.org/abs/2211.07524v1
Date: Mon, 14 Nov 2022 16:52:32 GMT
Title: Towards a Mathematics Formalisation Assistant using Large Language Models
Authors: Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, Anand Tadipatri
Abstract summary: We explore the abilities of a large language model (Codex) to help with formalisation in the Lean theorem prover. Codex is able to formalise short mathematical statements at undergrad level with nearly 75% accuracy for $120$ theorem statements. We show that with a new prompting strategy Codex can formalise these proofs in natural language with at least one out of twelve Codex completion being easy to repair into a complete proof.
Score: 5.485439959027125
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Mathematics formalisation is the task of writing mathematics (i.e., definitions, theorem statements, proofs) in natural language, as found in books and papers, into a formal language that can then be checked for correctness by a program. It is a thriving activity today, however formalisation remains cumbersome. In this paper, we explore the abilities of a large language model (Codex) to help with formalisation in the Lean theorem prover. We find that with careful input-dependent prompt selection and postprocessing, Codex is able to formalise short mathematical statements at undergrad level with nearly 75\% accuracy for $120$ theorem statements. For proofs quantitative analysis is infeasible and we undertake a detailed case study. We choose a diverse set of $13$ theorems at undergrad level with proofs that fit in two-three paragraphs. We show that with a new prompting strategy Codex can formalise these proofs in natural language with at least one out of twelve Codex completion being easy to repair into a complete proof. This is surprising as essentially no aligned data exists for formalised mathematics, particularly for proofs. These results suggest that large language models are a promising avenue towards fully or partially automating formalisation.

Related papers

Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification [56.218970738892764]
Chain-of-Thought prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs)<n>To mitigate hallucinations in CoT that are notoriously difficult to detect, current methods operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness.<n>We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations.
arXiv Detail & Related papers (2025-06-05T03:16:08Z)
LeanAgent: Lifelong Learning for Formal Theorem Proving [85.39415834798385]
We present LeanAgent, a novel lifelong learning framework for formal theorem proving. LeanAgent continuously generalizes to and improves on ever-expanding mathematical knowledge. It generates formal proofs for 155 theorems across 23 diverse Lean repositories.
arXiv Detail & Related papers (2024-10-08T17:11:24Z)
Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z)
Autoformalizing Euclidean Geometry [74.72212706513318]
We introduce a neuro-symbolic framework for autoformalizing Euclidean geometry. One challenge is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. We provide automatic semantic evaluation for autoformalized theorem statements.
arXiv Detail & Related papers (2024-05-27T14:35:10Z)
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity. We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z)
A New Approach Towards Autoformalization [7.275550401145199]
Autoformalization is the task of translating natural language mathematics into a formal language that can be verified by a program. Research paper mathematics requires large amounts of background and context. We propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks.
arXiv Detail & Related papers (2023-10-12T00:50:24Z)
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics [7.607254619341369]
We introduce ProofNet, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3. We report baseline results on statement autoformalization via in-context learning.
arXiv Detail & Related papers (2023-02-24T03:28:46Z)
Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs [5.045988012508899]
Autoformalization seeks to address this by translating proofs written in natural language into a formal representation that is computer-verifiable via interactive theorem provers. We introduce a semantic parsing approach, based on the Universal Transformer architecture, that translates elementary mathematical proofs into an equivalent formalization in the language of the Coq interactive theorem prover.
arXiv Detail & Related papers (2023-01-05T17:56:00Z)
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs [30.57062828812679]
We introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches. We show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs.
arXiv Detail & Related papers (2022-10-21T22:37:22Z)
NaturalProver: Grounded Mathematical Proof Generation with Language Models [84.2064569475095]
Theorem proving in natural mathematical language plays a central role in mathematical advances and education. We develop NaturalProver, a language model that generates proofs by conditioning on background references. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time.
arXiv Detail & Related papers (2022-05-25T17:01:18Z)
Autoformalization with Large Language Models [22.86710743804944]
A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. We show large language models provide new prospects towards this goal. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6%$ to $35.2%$.
arXiv Detail & Related papers (2022-05-25T09:53:30Z)
NaturalProofs: Mathematical Theorem Proving in Natural Language [132.99913141409968]
We develop NaturalProofs, a multi-domain corpus of mathematical statements and their proofs. NaturalProofs unifies broad coverage, deep coverage, and low-resource mathematical sources. We benchmark strong neural methods on mathematical reference retrieval and generation tasks.
arXiv Detail & Related papers (2021-03-24T03:14:48Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.