Related papers: Process-Driven Autoformalization in Lean 4

Process-Driven Autoformalization in Lean 4

URL: http://arxiv.org/abs/2406.01940v1
Date: Tue, 4 Jun 2024 03:48:08 GMT
Title: Process-Driven Autoformalization in Lean 4
Authors: Jianqiao Lu, Zhengying Liu, Yingjia Wan, Yinya Huang, Haiming Wang, Zhicheng Yang, Jing Tang, Zhijiang Guo,
Abstract summary: We develop a benchmark to evaluate the autoformalization capabilities of large language models. We also introduce a model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data.
Score: 17.19921264819679
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L}ean~\textbf{4} (\textbf{\name}) designed to evaluate the autoformalization capabilities of large language models (LLMs). This benchmark encompasses a comprehensive assessment of questions, answers, formal statements, and proofs. Additionally, we introduce a \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{PSV}) model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data. Furthermore, when fine-tuned with data containing detailed process information, PSV can leverage the data more effectively, leading to more significant improvements in autoformalization for Lean 4. Our dataset and code are available at \url{https://github.com/rookie-joe/PDA}.

Related papers

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion [30.51794514312176]
We identify two key abilities for effective autoformalization.<n>We introduce ThinkingF, a data synthesis and training pipeline that improves both abilities.<n>The resulting 7B and 32B models exhibit both comprehensive formal knowledge and strong informal-to-formal reasoning.
arXiv Detail & Related papers (2025-08-06T13:28:22Z)
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny [68.00108157244952]
Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes are neither reliable nor scalable.<n>A promising yet largely uncharted alternative is formal language-based reasoning.<n>Grounding LLMs in rigorous formal systems where generative models operate in formal language spaces (e.g., Dafny) enables the automatic and mathematically provable verification of their reasoning processes and outcomes.
arXiv Detail & Related papers (2025-07-22T08:13:01Z)
FMC: Formalization of Natural Language Mathematical Competition Problems [12.86616278136374]
We propose an autoformalization pipeline based on large language models with error feedback.<n>We curate an Olympiad-level dataset aligning natural language problems with Lean formalizations.<n>We show that few-shot learning, error feedback, and increasing sampling numbers enhance the autoformalization process.
arXiv Detail & Related papers (2025-07-15T12:52:47Z)
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization [1.204553980682492]
We introduce a novel methodology that leverages back-translation with hand-curated prompts to enhance the mathematical capabilities of language models. We show that our approaches surpass the performance of fine-tuning with extensive multilingual datasets. Taken together, our methods show a promising new approach to significantly reduce the resources required to formalize, thereby accelerating AI for math.
arXiv Detail & Related papers (2025-02-18T19:16:54Z)
FormalAlign: Automated Alignment Evaluation for Autoformalization [12.674957231226871]
textscFormalAlign trains on both the autoformalization sequence generation task and the representational alignment between input and output. evaluated across four benchmarks, textscFormalAlign demonstrates superior performance. This effective alignment evaluation significantly reduces the need for manual verification.
arXiv Detail & Related papers (2024-10-14T03:58:35Z)
Herald: A Natural Language Annotated Lean 4 Dataset [15.42247133378869]
This paper introduces a novel framework for translating the Mathlib4 corpus (a unified library of mathematics in formal language Lean 4) into natural language. We present the results of this pipeline on Mathlib4 as Herald (Hierarchy and Retrieval-based Translated Lean) We also propose the Herald Translator, which is fine-tuned on Herald.
arXiv Detail & Related papers (2024-10-09T10:11:24Z)
FactAlign: Long-form Factuality Alignment of Large Language Models [35.067998820937284]
Large language models have demonstrated significant potential as the next-generation information access engines. We propose FactAlign, a novel alignment framework designed to enhance the factuality of long-form responses. Our experiments on open-domain prompts and information-seeking questions demonstrate that FactAlign significantly improves the factual accuracy of LLM responses.
arXiv Detail & Related papers (2024-10-02T16:03:13Z)
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Improving Autoformalization using Type Checking [15.58948808529849]
We analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language. We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4% on ProofNet. We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.
arXiv Detail & Related papers (2024-06-11T13:01:50Z)
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems [50.22847430754973]
Large language models are not good at math theorem proving using formal languages like Lean. A significant challenge in this area is the scarcity of training data available in these formal languages. We propose a novel pipeline that iteratively generates and filters synthetic data to translate natural language mathematical problems into Lean 4 statements.
arXiv Detail & Related papers (2024-06-06T08:25:43Z)
Calibrating the Confidence of Large Language Models by Eliciting Fidelity [52.47397325111864]
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. Post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. We propose a plug-and-play method to estimate the confidence of language models.
arXiv Detail & Related papers (2024-04-03T11:36:12Z)
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE) It encodes the text corpus into a latent space, capturing current and future information from both source and target text. Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z)
Large Language Models Meet Knowledge Graphs to Answer Factoid Questions [57.47634017738877]
We propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs. We procure easily interpreted information with Transformer-based models through the linearization of the extracted subgraphs. Final re-ranking of the answer candidates with the extracted information boosts Hits@1 scores of the pre-trained text-to-text language models by 4-6%.
arXiv Detail & Related papers (2023-10-03T15:57:00Z)
Offline RL for Natural Language Generation with Implicit Language Q Learning [87.76695816348027]
Large language models can be inconsistent when it comes to completing user specified tasks. We propose a novel RL method, that combines both the flexible utility framework of RL with the ability of supervised learning. In addition to empirically validating ILQL, we present a detailed empirical analysis situations where offline RL can be useful in natural language generation settings.
arXiv Detail & Related papers (2022-06-05T18:38:42Z)
Autoformalization with Large Language Models [22.86710743804944]
A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. We show large language models provide new prospects towards this goal. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6%$ to $35.2%$.
arXiv Detail & Related papers (2022-05-25T09:53:30Z)
Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore. We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z)
Data Annealing for Informal Language Understanding Tasks [66.2988222278475]
We propose a data annealing transfer learning procedure to bridge the performance gap on informal language tasks. It successfully utilizes a pre-trained model such as BERT in informal language.
arXiv Detail & Related papers (2020-04-24T09:27:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.