Improving Autoformalization using Type Checking
- URL: http://arxiv.org/abs/2406.07222v2
- Date: Tue, 11 Feb 2025 11:02:10 GMT
- Title: Improving Autoformalization using Type Checking
- Authors: Auguste Poiroux, Gail Weiss, Viktor KunĨak, Antoine Bosselut,
- Abstract summary: We analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language.
We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4% on ProofNet.
We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.
- Score: 15.58948808529849
- License:
- Abstract: Autoformalization, the automatic translation of unconstrained natural language into formal languages, has garnered significant attention due to its potential applications in theorem proving, formal verification, and LLM output checking. In this work, we analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language. We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4\% on ProofNet. To support reproducibility and further research, we release our code, including new symbolic equivalence for Lean formulas. We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.
Related papers
- Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency [22.86318578119266]
We introduce a novel framework that scores and selects the best result from k autoformalization candidates based on symbolic equivalence and semantic consistency.
Our experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy.
arXiv Detail & Related papers (2024-10-28T11:37:39Z) - Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a novel framework that overcomes the lack of human-written proof to enable automated proof generation of Rust code.
We demonstrate superior efficiency and precision compared to GPT-4o.
This advancement leads to a significant improvement in performance, achieving a 70.50% accuracy rate in a benchmark crafted by human experts.
arXiv Detail & Related papers (2024-10-21T08:15:45Z) - Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.
For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.
As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier.
Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z) - Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages [6.0608817611709735]
We propose a metric for evaluating the quality of specifications for verification-aware languages.
We show that our metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark.
We also outline formal verification challenges that need to be addressed to apply the technique more widely.
arXiv Detail & Related papers (2024-06-14T06:52:08Z) - Process-Driven Autoformalization in Lean 4 [30.056591518828554]
We develop a benchmark to evaluate the autoformalization capabilities of large language models.
We also introduce a model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization.
Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data.
arXiv Detail & Related papers (2024-06-04T03:48:08Z) - Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - Unsupervised Pretraining for Fact Verification by Language Model
Distillation [4.504050940874427]
We propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework.
It distils self-supervised features into high-quality claim-fact alignments without the need for annotations.
This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments.
arXiv Detail & Related papers (2023-09-28T15:53:44Z) - Autoformalization with Large Language Models [22.86710743804944]
A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence.
We show large language models provide new prospects towards this goal.
Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6%$ to $35.2%$.
arXiv Detail & Related papers (2022-05-25T09:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.