Related papers: Inferring multiple helper Dafny assertions with LLMs

Inferring multiple helper Dafny assertions with LLMs

URL: http://arxiv.org/abs/2511.00125v1
Date: Fri, 31 Oct 2025 09:45:39 GMT
Title: Inferring multiple helper Dafny assertions with LLMs
Authors: Álvaro Silva, Alexandra Mendes, Ruben Martins,
Abstract summary: We investigate the use of Large Language Models to automatically infer missing helper assertions in Dafny programs.<n>We introduce a taxonomy of assertion types to analyze inference difficulty.<n>Results show that automated assertion inference can substantially reduce proof engineering effort.
Score: 47.33158055894705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Dafny verifier provides strong correctness guarantees but often requires numerous manual helper assertions, creating a significant barrier to adoption. We investigate the use of Large Language Models (LLMs) to automatically infer missing helper assertions in Dafny programs, with a primary focus on cases involving multiple missing assertions. To support this study, we extend the DafnyBench benchmark with curated datasets where one, two, or all assertions are removed, and we introduce a taxonomy of assertion types to analyze inference difficulty. Our approach refines fault localization through a hybrid method that combines LLM predictions with error-message heuristics. We implement this approach in a new tool called DAISY (Dafny Assertion Inference SYstem). While our focus is on multiple missing assertions, we also evaluate DAISY on single-assertion cases. DAISY verifies 63.4% of programs with one missing assertion and 31.7% with multiple missing assertions. Notably, many programs can be verified with fewer assertions than originally present, highlighting that proofs often admit multiple valid repair strategies and that recovering every original assertion is unnecessary. These results demonstrate that automated assertion inference can substantially reduce proof engineering effort and represent a step toward more scalable and accessible formal verification.

Related papers

ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization [73.0780809974414]
We propose a Reflective Autoformalization method that integrates semantic consistency evaluation into the autoformalization process.<n>This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors.<n>Experiments show that ReForm achieves an average improvement of 22.6 percentage points over the strongest baselines.
arXiv Detail & Related papers (2025-10-28T16:22:54Z)
Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning [78.29317733206643]
We introduce Veracity Search (VS), a discrete search algorithm over veracity assignments.<n>It performs otherwise intractable inference in the posterior distribution over latent veracity values.<n>It generalizes VS, enabling accurate zero-shot veracity inference in novel contexts.
arXiv Detail & Related papers (2025-05-17T04:16:36Z)
Agentic Verification for Ambiguous Query Disambiguation [42.238086712267396]
We tackle the challenge of disambiguating queries in retrieval-augmented generation (RAG) to diverse yet answerable interpretations.<n>We propose a joint approach to unify diversification with verification by incorporating feedback from retriever and generator early on.<n>We validate the efficiency and effectiveness of our method on the widely adopted ASQA benchmark to achieve diverse yet verifiable interpretations.
arXiv Detail & Related papers (2025-02-14T18:31:39Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation [6.3585378855805725]
We present a novel benchmark to evaluate Large-Language Models' effectiveness for assertion generation.<n>AssertioBench contains 100 curated Verilog hardware designs from OpenCores and formally verified assertions for each design generated from GoldMine and HARM.
arXiv Detail & Related papers (2024-06-26T14:47:28Z)
Laurel: Unblocking Automated Verification with Large Language Models [2.6942525604796366]
We propose Laurel, a tool that automatically generates assertions using large language models.<n>We evaluate our techniques on our new benchmark DafnyGym, a dataset of complex lemmas we extracted from three real-world Dafnys.
arXiv Detail & Related papers (2024-05-27T03:26:01Z)
Knowledge-Augmented Language Model Verification [68.6099592486075]
Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. We propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier. Our results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs.
arXiv Detail & Related papers (2023-10-19T15:40:00Z)
SAGA: Summarization-Guided Assert Statement Generation [34.51502565985728]
This paper presents a novel summarization-guided approach for automatically generating assert statements. We leverage a pre-trained language model as the reference architecture and fine-tune it on the task of assert statement generation.
arXiv Detail & Related papers (2023-05-24T07:03:21Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
ReAssert: Deep Learning for Assert Generation [3.8174671362014956]
We present RE-ASSERT, an approach for the automated generation of JUnit test asserts. This is achieved by targeting projects individually, using precise code-to-test traceability for learning. We also utilise Reformer, a state-of-the-art deep learning model, along with two models from previous work to evaluate ReAssert and an existing approach, known as ATLAS.
arXiv Detail & Related papers (2020-11-19T11:55:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.