Leveraging Large Language Models to Boost Dafny's Developers
Productivity
- URL: http://arxiv.org/abs/2401.00963v1
- Date: Mon, 1 Jan 2024 21:58:13 GMT
- Title: Leveraging Large Language Models to Boost Dafny's Developers
Productivity
- Authors: \'Alvaro Silva, Alexandra Mendes, Jo\~ao F. Ferreira
- Abstract summary: This paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers.
A new Dafny plugin generates suggestions for relevant lemmas that Dafny is unable to discover and use.
For the lemmas that cannot be proved automatically, the plugin also attempts to provide accompanying calculational proofs.
- Score: 49.64902130083662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research idea paper proposes leveraging Large Language Models (LLMs) to
enhance the productivity of Dafny developers. Although the use of
verification-aware languages, such as Dafny, has increased considerably in the
last decade, these are still not widely adopted. Often the cost of using such
languages is too high, due to the level of expertise required from the
developers and challenges that they often face when trying to prove a program
correct. Even though Dafny automates a lot of the verification process,
sometimes there are steps that are too complex for Dafny to perform on its own.
One such case is that of missing lemmas, i.e. Dafny is unable to prove a result
without being given further help in the form of a theorem that can assist it in
the proof of the step.
In this paper, we describe preliminary work on a new Dafny plugin that
leverages LLMs to assist developers by generating suggestions for relevant
lemmas that Dafny is unable to discover and use. Moreover, for the lemmas that
cannot be proved automatically, the plugin also attempts to provide
accompanying calculational proofs. We also discuss ideas for future work by
describing a research agenda on using LLMs to increase the adoption of
verification-aware languages in general, by increasing developers productivity
and by reducing the level of expertise required for crafting formal
specifications and proving program properties.
Related papers
- Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning [79.48839334040197]
Instruction Fine-tuning (IFT) can enhance the helpfulness of Large Language Models (LLMs)
IFT steers LLMs to generate responses with long-tail knowledge that is not well covered during pre-training, leading to more informative but less truthful answers when generalizing to unseen tasks.
We propose $textbfUNIT$, a novel IFT paradigm to address this trade-off.
arXiv Detail & Related papers (2025-02-17T16:10:30Z) - Dafny as Verification-Aware Intermediate Language for Code Generation [0.0]
Large language models (LLMs) generate source code from natural language prompts.
One of its limitations is that the generated code can be faulty at times, despite being presented to the user as correct.
We propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny.
The correct Dafny program is then compiled to the target language and returned to the user.
arXiv Detail & Related papers (2025-01-10T17:23:14Z) - dafny-annotator: AI-Assisted Verification of Dafny Programs [4.651620941143133]
We explore using a combination of Large Language Models and search to build dafny-annotator.
On a test set from the DafnyBench collection of programs, greedy search guided by LLaMa 3.1 8B successfully annotates only 15.7% of the methods.
Our results suggest a path towards capable AI assistants for languages that don't yet have large-scale human-generated examples.
arXiv Detail & Related papers (2024-11-05T19:27:56Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Laurel: Generating Dafny Assertions Using Large Language Models [2.6942525604796366]
We propose Laurel, a tool that uses large language models (LLMs) to automatically generate helper assertions for Dafny programs.
Laurel is able to generate over 50% of the required helper assertions given only a few attempts.
arXiv Detail & Related papers (2024-05-27T03:26:01Z) - Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting.
We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.