Related papers: FormaRL: Enhancing Autoformalization with no Labeled Data

FormaRL: Enhancing Autoformalization with no Labeled Data

URL: http://arxiv.org/abs/2508.18914v1
Date: Tue, 26 Aug 2025 10:38:18 GMT
Title: FormaRL: Enhancing Autoformalization with no Labeled Data
Authors: Yanxing Huang, Xinling Jin, Sijie Liang, Peng Li, Yang Liu,
Abstract summary: We propose textbfFormaRL, a simple yet efficient reinforcement learning framework for autoformalization.<n>FormaRL integrates syntax check from Lean compiler and consistency check from large language model to calculate the reward.<n>Experiments show that FormaRL can increase the pass@1 autoformalization accuracy of Qwen2.5-Coder-7B-Instruct by 4 $sim$ 6x.
Score: 5.607770363395876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoformalization is one of the central tasks in formal verification, while its advancement remains hindered due to the data scarcity and the absence efficient methods. In this work we propose \textbf{FormaRL}, a simple yet efficient reinforcement learning framework for autoformalization which only requires a small amount of unlabeled data. FormaRL integrates syntax check from Lean compiler and consistency check from large language model to calculate the reward, and adopts GRPO algorithm to update the formalizer. We also curated a proof problem dataset from undergraduate-level math materials, named \textbf{uproof}, in the hope to facilitate the exploration of autoformalization and theorem proving in advanced math. Experiments show that FormaRL can increase the pass@1 autoformalization accuracy of Qwen2.5-Coder-7B-Instruct by 4 $\sim$ 6x (4.04\% $\to$ 26.15\% on ProofNet and 2.4\% $\to$ 9.6\% on uproof) with merely 859 unlabeled data. And on uproof our method also achieved a strong improvement in out-of-distribution performance compared to existing open-source state-of-the-art autoformalizers on both pass@1 accuracy (6.2\% $\to$ 9.6\%) and pass@16 accuracy (24.4\% $\to$ 33.6\%). Training code of FormaRL is open-sourced at https://github.com/THUNLP-MT/FormaRL.

Related papers

TheoremForge: Scaling up Formal Data Synthesis with Low-Budget Agentic Workflow [32.80602352741069]
We introduce textbfTheoremForge, a cost-effective formal data synthesis pipeline.<n>Our strategy increases data yield by textbf1.6$times$ for proof generation compared to standard filtering.<n>Results establish TheoremForge as a scalable framework for constructing a data flywheel to train future expert models.
arXiv Detail & Related papers (2026-01-24T06:28:11Z)
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization [73.0780809974414]
We propose a Reflective Autoformalization method that integrates semantic consistency evaluation into the autoformalization process.<n>This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors.<n>Experiments show that ReForm achieves an average improvement of 22.6 percentage points over the strongest baselines.
arXiv Detail & Related papers (2025-10-28T16:22:54Z)
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving [72.8626512877667]
We introduce Goedel-Prover, an open-source language model that achieves state-of-the-art (as of April 5 2025) performance in automated formal proof generation for mathematical problems.<n>First, we train LLMs to convert natural language math problems from the Numina dataset to equivalent formal statements in Lean 4.<n>Next, we develop a large dataset of formal proofs by training a series of provers.<n>Finally, we obtain the dataset Goedel-Pset-v1-solved, which contains proofs for over 800K statements from Goedel-Pset-v1
arXiv Detail & Related papers (2025-02-11T15:27:35Z)
ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis [50.020850767257095]
We propose ProofAug, a procedure that equips LLMs with automation methods at various granularities.<n>Our method is validated on the miniF2F benchmark using the open-source deep-math-7b-base model and the Isabelle proof assistant.<n>We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-seek-Distill-1.5B from 44.3% to 50.4%.
arXiv Detail & Related papers (2025-01-30T12:37:06Z)
FormalAlign: Automated Alignment Evaluation for Autoformalization [12.674957231226871]
textscFormalAlign trains on both the autoformalization sequence generation task and the representational alignment between input and output. evaluated across four benchmarks, textscFormalAlign demonstrates superior performance. This effective alignment evaluation significantly reduces the need for manual verification.
arXiv Detail & Related papers (2024-10-14T03:58:35Z)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z)
Improving Autoformalization using Type Checking [15.58948808529849]
We analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language.<n>We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4% on ProofNet.<n>We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.
arXiv Detail & Related papers (2024-06-11T13:01:50Z)
Process-Driven Autoformalization in Lean 4 [30.056591518828554]
We develop a benchmark to evaluate the autoformalization capabilities of large language models. We also introduce a model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data.
arXiv Detail & Related papers (2024-06-04T03:48:08Z)
Autoformalizing Euclidean Geometry [74.72212706513318]
We introduce a neuro-symbolic framework for autoformalizing Euclidean geometry. One challenge is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. We provide automatic semantic evaluation for autoformalized theorem statements.
arXiv Detail & Related papers (2024-05-27T14:35:10Z)
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity. We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z)
Multilingual Mathematical Autoformalization [14.433478397963123]
Autoformalization is the task of translating natural language materials into machine-verifiable formalisations. Existing methods tend to circumvent this challenge by manually curating small corpora. In this work, we create $textttMMA$, a large, flexible, multilingual, and multi-domain dataset of informal-formal pairs.
arXiv Detail & Related papers (2023-11-07T06:42:15Z)
Autoformalization with Large Language Models [22.86710743804944]
A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. We show large language models provide new prospects towards this goal. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6%$ to $35.2%$.
arXiv Detail & Related papers (2022-05-25T09:53:30Z)
PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs. Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm. We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.