Related papers: ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

URL: http://arxiv.org/abs/2505.24230v1
Date: Fri, 30 May 2025 05:44:34 GMT
Title: ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction
Authors: Murari Ambati,
Abstract summary: We propose ProofNet++, a neuro-symbolic framework that enhances automated theorem proving.<n>We show that ProofNet++ significantly improves proof accuracy, correctness, and formal verifiability over prior models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose ProofNet++, a neuro-symbolic framework that enhances automated theorem proving by combining large language models (LLMs) with formal proof verification and self-correction mechanisms. Current LLM-based systems suffer from hallucinated logical steps and unverifiable reasoning. ProofNet++ mitigates these limitations by integrating symbolic proof tree supervision, a reinforcement learning loop using verifiers as reward functions, and an iterative self-correction module. Our experiments on miniF2F, Lean's mathlib, and HOL Light show that ProofNet++ significantly improves proof accuracy, correctness, and formal verifiability over prior models. We provide theoretical analysis of the convergence and stability of the verifier-guided RL framework and release our datasets and codebase for future research.

Related papers

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following [55.60192044049083]
Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs)<n>We propose VerIF, a verification method that combines rule-based code verification with LLM-based verification from a large reasoning model.<n>We apply RL training with VerIF to two models, achieving significant improvements across several representative instruction-following benchmarks.
arXiv Detail & Related papers (2025-06-11T17:10:36Z)
Abstraction-Based Proof Production in Formal Verification of Neural Networks [0.7048747239308888]
We introduce a novel framework for proof-producing abstraction-based verification.<n>This work aims to enable scalable and trustworthy verification by supporting common abstraction techniques.
arXiv Detail & Related papers (2025-06-11T07:00:09Z)
Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach [14.213719696233934]
Large language models (LLMs) struggle with formal domains that require rigorous logical deduction and symbolic reasoning.<n>We propose a neuro-symbolic approach that combines LLMs' generative strengths with structured components to overcome this challenge.
arXiv Detail & Related papers (2025-05-20T15:13:32Z)
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework [2.9334627971166336]
This paper investigates the logical reasoning capabilities of large language models (LLMs)<n>A trained LLM receives as input a set of assumptions and a goal, and produces as output a proof that formally derives the goal from the assumptions.<n>A critical obstacle for training is the scarcity of real-world proofs.
arXiv Detail & Related papers (2025-04-28T19:25:29Z)
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction [74.79306773878955]
We introduce LeanProgress, a method that predicts the progress in the proof.<n>Our experiments show that LeanProgress achieves an overall prediction accuracy of 75.1%.
arXiv Detail & Related papers (2025-02-25T07:46:36Z)
Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a framework that overcomes the lack of human-written snippets to enable automated proof generation of Rust code.<n> SAFE re-purposes the large number of synthesized incorrect proofs to train the self-ging capability of the fine-tuned models.<n>We achieve a 52.52% accuracy rate in a benchmark crafted by human experts, a significant leap over GPT-4o's performance of 14.39%.
arXiv Detail & Related papers (2024-10-21T08:15:45Z)
Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z)
Improving Autoformalization using Type Checking [15.58948808529849]
We analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language.<n>We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4% on ProofNet.<n>We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.
arXiv Detail & Related papers (2024-06-11T13:01:50Z)
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs [95.07757789781213]
Two lines of approaches are adopted for complex reasoning with LLMs.<n>One line of work prompts LLMs with various reasoning structures, while the structural outputs can be naturally regarded as intermediate reasoning steps.<n>The other line of work adopt LLM-free declarative solvers to do the reasoning task, rendering higher reasoning accuracy but lacking interpretability due to the black-box nature of the solvers.<n>We present a simple extension to the latter line of work. Specifically, we showcase that the intermediate search logs generated by Prolog interpreters can be accessed and interpreted into human-readable reasoning.
arXiv Detail & Related papers (2023-11-16T11:26:21Z)
Distributional Inclusion Hypothesis and Quantifications: Probing for Hypernymy in Functional Distributional Semantics [50.363809539842386]
Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions. We show that FDS models learn hypernymy on a restricted class of corpus that strictly follows the Distributional Inclusion Hypothesis (DIH)
arXiv Detail & Related papers (2023-09-15T11:28:52Z)
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements. This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models. We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.