Related papers: Learning to Repair Lean Proofs from Compiler Feedback

Learning to Repair Lean Proofs from Compiler Feedback

URL: http://arxiv.org/abs/2602.02990v1
Date: Tue, 03 Feb 2026 01:53:56 GMT
Title: Learning to Repair Lean Proofs from Compiler Feedback
Authors: Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Vasily Ilin,
Abstract summary: We study Lean proof repair as a supervised learning problem.<n>We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised theorems.<n>We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers.
Score: 4.55626337217127
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correct proofs, offering little supervision for understanding and repairing failures. We study Lean proof repair as a supervised learning problem: given an erroneous proof and compiler feedback, predict both a corrected proof and a natural-language diagnosis grounded in the same feedback. We introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 supervised tuples pairing systematically generated proof failures with compiler diagnostics and aligned repair and explanation targets. Training language models on APRIL substantially improves repair accuracy and feedback-conditioned reasoning; in our single-shot repair evaluation setting, a finetuned 4B-parameter model outperforms the strongest open-source baseline. We view diagnostic-conditioned supervision as a complementary training signal for feedback-using provers. Our dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/APRIL}{this link}.

Related papers

Proof-RM: A Scalable and Generalizable Reward Model for Math Proof [67.53066972145183]
Large Language Models (LLMs) have demonstrated strong math reasoning abilities through Reinforcement Learning with *Verifiable Rewards* (RLVR)<n>Many advanced mathematical problems are proof-based, with no guaranteed way to determine the authenticity of a proof by simple answer matching.<n>To enable automatic verification, a Reward Model (RM) capable of reliably evaluating full proof processes is required.
arXiv Detail & Related papers (2026-02-02T17:42:53Z)
Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior [58.751981587234916]
This paper formalizes the Refinement Provenance Inference (RPI) audit task as Refinement Provenance Inference (RPI)<n>We propose RePro, a logit-based framework that fuses teacher-forced likelihood features with logit-ranking signals.<n>During training, RePro learns a transferable representation via shadow fine-tuning, and uses a lightweight linear head to infer provenance on unseen victims without training-data access.
arXiv Detail & Related papers (2026-01-05T10:16:41Z)
Rethinking the Capability of Fine-Tuned Language Models for Automated Vulnerability Repair [5.847724760751716]
Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability patches.<n>Our empirical study reveals that state-of-the-art models often overfit to the training set and are evaluated using training, validation, and test sets that are not mutually exclusive.<n>We introduce L-AVRBench, a test-based benchmark tailored for learning-based, to overcome the limitations of match-based metrics and examine the models' true repair capabilities.
arXiv Detail & Related papers (2025-12-27T16:12:43Z)
CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal [84.71254539482369]
Group-relative reinforcement learning with verifiable rewards (RLVR) often wastes the most informative data it already has the failures.<n>We present CARE, a failure-centric post-training framework for multimodal reasoning that turns errors into supervision.<n> CARE improves accuracy and training smoothness while explicitly increasing the share of learning signal that comes from failures.
arXiv Detail & Related papers (2025-12-22T16:34:21Z)
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions [49.55618517046225]
Language models trained on web-scale corpora risk memorizing and exposing sensitive information.<n>We propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework.<n>CURE verifies model outputs for leakage and revises them into safe responses.
arXiv Detail & Related papers (2025-09-30T09:07:45Z)
Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs [5.10123605644148]
Automated Vulnerability Repair (AVR) is a fast-growing branch of program repair.<n>Recent studies show that large language models (LLMs) outperform traditional techniques.
arXiv Detail & Related papers (2025-07-28T16:39:16Z)
STRIVE: Structured Reasoning for Self-Improvement in Claim Verification [30.15803409441136]
We propose STRIVE: Structured Reasoning for Self-Improved Verification.<n>Our method introduces a structured reasoning design with Claim Decomposition, Entity Analysis, and Evidence Grounding Verification.<n>It is then applied to generate reasoning chains for all training examples, selecting only those that are correct and structurally sound for subsequent self-improvement training.
arXiv Detail & Related papers (2025-02-17T16:07:07Z)
Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs) This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z)
Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models [5.463333911506443]
We aim to enhance the self-checking capabilities of large language models (LLMs) by constructing training data for checking tasks. We propose a specialized checking format called "Step CoT Check" Experiments demonstrate that fine-tuning with the "Step CoT Check" format significantly improves the self-checking and self-correction abilities of LLMs.
arXiv Detail & Related papers (2024-02-20T14:23:23Z)
Unsupervised Pretraining for Fact Verification by Language Model Distillation [4.504050940874427]
We propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework. It distils self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments.
arXiv Detail & Related papers (2023-09-28T15:53:44Z)
Lyra: Orchestrating Dual Correction in Automated Theorem Proving [63.115422781158934]
Lyra is a new framework that employs two distinct correction mechanisms: Tool Correction and Conjecture Correction. Tool Correction contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts.
arXiv Detail & Related papers (2023-09-27T17:29:41Z)
Factual Error Correction of Claims [18.52583883901634]
This paper introduces the task of factual error correction. It provides a mechanism to correct written texts that contain misinformation. It acts as an inherent explanation for claims already partially supported by evidence.
arXiv Detail & Related papers (2020-12-31T18:11:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.