Leveraging Large Language Models to Boost Dafny's Developers
Productivity
- URL: http://arxiv.org/abs/2401.00963v1
- Date: Mon, 1 Jan 2024 21:58:13 GMT
- Title: Leveraging Large Language Models to Boost Dafny's Developers
Productivity
- Authors: \'Alvaro Silva, Alexandra Mendes, Jo\~ao F. Ferreira
- Abstract summary: This paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers.
A new Dafny plugin generates suggestions for relevant lemmas that Dafny is unable to discover and use.
For the lemmas that cannot be proved automatically, the plugin also attempts to provide accompanying calculational proofs.
- Score: 49.64902130083662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research idea paper proposes leveraging Large Language Models (LLMs) to
enhance the productivity of Dafny developers. Although the use of
verification-aware languages, such as Dafny, has increased considerably in the
last decade, these are still not widely adopted. Often the cost of using such
languages is too high, due to the level of expertise required from the
developers and challenges that they often face when trying to prove a program
correct. Even though Dafny automates a lot of the verification process,
sometimes there are steps that are too complex for Dafny to perform on its own.
One such case is that of missing lemmas, i.e. Dafny is unable to prove a result
without being given further help in the form of a theorem that can assist it in
the proof of the step.
In this paper, we describe preliminary work on a new Dafny plugin that
leverages LLMs to assist developers by generating suggestions for relevant
lemmas that Dafny is unable to discover and use. Moreover, for the lemmas that
cannot be proved automatically, the plugin also attempts to provide
accompanying calculational proofs. We also discuss ideas for future work by
describing a research agenda on using LLMs to increase the adoption of
verification-aware languages in general, by increasing developers productivity
and by reducing the level of expertise required for crafting formal
specifications and proving program properties.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages [6.0608817611709735]
We propose a metric for evaluating the quality of specifications for verification-aware languages.
We show that our metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark.
We also outline formal verification challenges that need to be addressed to apply the technique more widely.
arXiv Detail & Related papers (2024-06-14T06:52:08Z) - Laurel: Generating Dafny Assertions Using Large Language Models [2.6942525604796366]
We propose Laurel, a tool that uses large language models (LLMs) to automatically generate helper assertions for Dafny programs.
Laurel is able to generate over 50% of the required helper assertions given only a few attempts.
arXiv Detail & Related papers (2024-05-27T03:26:01Z) - Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting.
We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - FactLLaMA: Optimizing Instruction-Following Language Models with
External Knowledge for Automated Fact-Checking [10.046323978189847]
We propose combining the power of instruction-following language models with external evidence retrieval to enhance fact-checking performance.
Our approach involves leveraging search engines to retrieve relevant evidence for a given input claim.
Then, we instruct-tune an open-sourced language model, called LLaMA, using this evidence, enabling it to predict the veracity of the input claim more accurately.
arXiv Detail & Related papers (2023-09-01T04:14:39Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.