Related papers: Monotonic Reference-Free Refinement for Autoformalization

Monotonic Reference-Free Refinement for Autoformalization

URL: http://arxiv.org/abs/2601.23166v1
Date: Fri, 30 Jan 2026 16:48:33 GMT
Title: Monotonic Reference-Free Refinement for Autoformalization
Authors: Lan Zhang, Marco Valentino, André Freitas,
Abstract summary: We introduce a reference-free iterative monotonic process for full-theorem autoformalization.<n>We optimize a masked composite objective over Formal Validity, Logical Preservation, Mathematical Consistency, and Formal Quality.<n> Empirical experiments demonstrate the proposed process enables simultaneous improvement across multiple dimensions.
Score: 38.81622317169746
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While statement autoformalization has advanced rapidly, full-theorem autoformalization remains largely unexplored. Existing iterative refinement methods in statement autoformalization typicall improve isolated aspects of formalization, such as syntactic correctness, but struggle to jointly optimizing multiple quality dimensions, which is critical for full-theorem autoformalization. We introduce a reference-free iterative monotonic process for full-theorem autoformalization that leverages complementary feedback from theorem provers and LLM-based judges, without access to ground-truth proofs or existing formalizations at inference time. Our approach optimizes a masked composite objective over Formal Validity, Logical Preservation, Mathematical Consistency, and Formal Quality, guided by a responsiveness map that indicates how different LLMs acting as different roles preferentially improve each dimension. We further propose an acceptance policy that guarantees certified monotonic improvement, and provide conditions ensuring convergence and termination. Empirical experiments demonstrate the proposed process enables simultaneous improvement across multiple dimensions, achieving 93.44% formal validity and a 78.22% overall score on miniF2F, and 44.09% formal validity and a 29.79% overall score on ProofNet.

Related papers

Distill and Align Decomposition for Enhanced Claim Verification [51.93960785128124]
Complex claim verification requires decomposing sentences into verifiable subclaims.<n>We propose a reinforcement learning approach that optimises decomposition quality and verifier alignment.<n>Our framework enables smaller language models to achieve state-of-the-art claim verification.
arXiv Detail & Related papers (2026-02-25T12:32:04Z)
Regularized Gradient Temporal-Difference Learning [6.622208195193136]
Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation.<n>We propose a regularized optimization objective by reformulating the mean-square projected Bellman error (MSPBE) minimization.<n>This formulation naturally yields a regularized GTD algorithms, referred to as R-GTD, which guarantees convergence to a unique solution even when the FIM is singular.
arXiv Detail & Related papers (2026-01-28T13:37:42Z)
Decompose-and-Formalise: Recursively Verifiable Natural Language Inference [34.505373718498014]
Large language models (LLMs) with theorem provers (TPs) in neuro-symbolic pipelines helps with entailment verification and proof-guided refinement of explanations for natural language inference (NLI)<n>We propose a decompose-and-formalise framework that decomposes premise-hypothesis pairs into an entailment tree of atomic steps.<n>We also introduce $$-substitution in an event-based logical form to enforce consistent argument-role bindings.
arXiv Detail & Related papers (2026-01-27T13:43:30Z)
Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking [64.97768177044355]
Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems.<n>We present FactArena, a fully automated arena-style evaluation framework.<n>Our analyses reveal significant discrepancies between static claim-verification accuracy and end-to-end fact-checking competence.
arXiv Detail & Related papers (2026-01-06T02:51:56Z)
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization [73.0780809974414]
We propose a Reflective Autoformalization method that integrates semantic consistency evaluation into the autoformalization process.<n>This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors.<n>Experiments show that ReForm achieves an average improvement of 22.6 percentage points over the strongest baselines.
arXiv Detail & Related papers (2025-10-28T16:22:54Z)
Conjecturing: An Overlooked Step in Formal Mathematical Reasoning [10.398167568549924]
Many mathematical problems cannot be formalised directly without first conjecturing a conclusion.<n>We develop ConjectureBench to measure the conjecturing capabilities of Large Language Models.<n>We show that our method achieves the first successful end-to-end autoformalisation of 13 PutnamBench problems with GPT-4.1 and 7 with DeepSeek-V3.1.
arXiv Detail & Related papers (2025-10-13T22:29:49Z)
Autoformalizer with Tool Feedback [52.334957386319864]
Autoformalization addresses the scarcity of data for Automated Theorem Proving (ATP) by translating mathematical problems from natural language into formal statements.<n>Existing formalizer still struggles to consistently generate valid statements that meet syntactic validity and semantic consistency.<n>We propose the Autoformalizer with Tool Feedback (ATF), a novel approach that incorporates syntactic and consistency information as tools into the formalization process.
arXiv Detail & Related papers (2025-10-08T10:25:12Z)
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning [78.29317733206643]
We introduce Veracity Search (VS), a discrete search algorithm over veracity assignments.<n>It performs otherwise intractable inference in the posterior distribution over latent veracity values.<n>It generalizes VS, enabling accurate zero-shot veracity inference in novel contexts.
arXiv Detail & Related papers (2025-05-17T04:16:36Z)
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency [22.86318578119266]
We introduce a novel framework that scores and selects the best result from k autoformalization candidates based on symbolic equivalence and semantic consistency.<n>Our experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy.
arXiv Detail & Related papers (2024-10-28T11:37:39Z)
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation [71.32761934724867]
This work proposes Alchemy, a framework for data synthesis that constructs formal theorems through symbolic mutation.<n>For each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it.<n>As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M.
arXiv Detail & Related papers (2024-10-21T08:04:21Z)
Reliable Evaluation and Benchmarks for Statement Autoformalization [18.218951526592914]
We present a comprehensive approach combining improved metrics, robust benchmarks, and systematic evaluation.<n>First, we introduce BEq+, an automated metric that correlates strongly with human judgment, along with ProofNetVerif, a new dataset for assessing the quality of evaluation metrics.<n>We develop two new autoformalization benchmarks: ProofNet#, a corrected version of ProofNet, and RLM25, with 619 new pairs of research-level mathematics from six formalization projects.
arXiv Detail & Related papers (2024-06-11T13:01:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.