What's in a Proof? Analyzing Expert Proof-Writing Processes in F* and Verus
- URL: http://arxiv.org/abs/2508.02733v1
- Date: Fri, 01 Aug 2025 22:16:30 GMT
- Title: What's in a Proof? Analyzing Expert Proof-Writing Processes in F* and Verus
- Authors: Rijul Jain, Shraddha Barke, Gabriel Ebner, Md Rakib Hossain Misu, Shan Lu, Sarah Fakhoury,
- Abstract summary: We conduct a user study involving the collection and analysis of fine-grained source code telemetry from eight experts working with two languages.<n>Results reveal interesting trends and patterns about how experts reason about proofs and key challenges encountered during the proof development process.<n>We translate these findings into concrete design guidance for AI proof assistants.
- Score: 2.8003002159083237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proof-oriented programming languages (POPLs) empower developers to write code alongside formal correctness proofs, providing formal guarantees that the code adheres to specified requirements. Despite their powerful capabilities, POPLs present a steep learning curve and have not yet been adopted by the broader software community. The lack of understanding about the proof-development process and how expert proof developers interact with POPLs has hindered the advancement of effective proof engineering and the development of proof-synthesis models/tools. In this work, we conduct a user study, involving the collection and analysis of fine-grained source code telemetry from eight experts working with two languages, F* and Verus. Results reveal interesting trends and patterns about how experts reason about proofs and key challenges encountered during the proof development process. We identify three distinct strategies and multiple informal practices that are not captured final code snapshots, yet are predictive of task outcomes. We translate these findings into concrete design guidance for AI proof assistants: bias toward early specification drafting, explicit sub-goal decomposition, bounded active errors, and disciplined verifier interaction. We also present a case study of an F* proof agent grounded in these recommendations, and demonstrate improved performance over baseline LLMs
Related papers
- Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach [14.213719696233934]
Large language models (LLMs) struggle with formal domains that require rigorous logical deduction and symbolic reasoning.<n>We propose a neuro-symbolic approach that combines LLMs' generative strengths with structured components to overcome this challenge.
arXiv Detail & Related papers (2025-05-20T15:13:32Z) - LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction [74.79306773878955]
We introduce LeanProgress, a method that predicts the progress in the proof.<n>Our experiments show that LeanProgress achieves an overall prediction accuracy of 75.1%.
arXiv Detail & Related papers (2025-02-25T07:46:36Z) - VEL: A Formally Verified Reasoner for OWL2 EL Profile [0.0]
VEL is a formal verified EL++ reasoner equipped with machine-checkable correctness proofs.<n>Our work demonstrates the necessity of mechanization of reasoning algorithms to ensure their correctness at theoretical and implementation levels.
arXiv Detail & Related papers (2024-12-11T19:17:28Z) - Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation [27.60611509339328]
We argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step is always positioned to the left of that proof step.<n>We demonstrate that training is most effective when the proof is in the intuitively sequential order.
arXiv Detail & Related papers (2024-10-30T18:00:04Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.<n>Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation [6.3585378855805725]
We present a novel benchmark to evaluate Large-Language Models' effectiveness for assertion generation.<n>AssertioBench contains 100 curated Verilog hardware designs from OpenCores and formally verified assertions for each design generated from GoldMine and HARM.
arXiv Detail & Related papers (2024-06-26T14:47:28Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z) - Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search)
NLProofS learns to generate relevant steps conditioning on the hypothesis.
It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.