Leveraging Large Language Models to Boost Dafny's Developers
Productivity
- URL: http://arxiv.org/abs/2401.00963v1
- Date: Mon, 1 Jan 2024 21:58:13 GMT
- Title: Leveraging Large Language Models to Boost Dafny's Developers
Productivity
- Authors: \'Alvaro Silva, Alexandra Mendes, Jo\~ao F. Ferreira
- Abstract summary: This paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers.
A new Dafny plugin generates suggestions for relevant lemmas that Dafny is unable to discover and use.
For the lemmas that cannot be proved automatically, the plugin also attempts to provide accompanying calculational proofs.
- Score: 49.64902130083662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research idea paper proposes leveraging Large Language Models (LLMs) to
enhance the productivity of Dafny developers. Although the use of
verification-aware languages, such as Dafny, has increased considerably in the
last decade, these are still not widely adopted. Often the cost of using such
languages is too high, due to the level of expertise required from the
developers and challenges that they often face when trying to prove a program
correct. Even though Dafny automates a lot of the verification process,
sometimes there are steps that are too complex for Dafny to perform on its own.
One such case is that of missing lemmas, i.e. Dafny is unable to prove a result
without being given further help in the form of a theorem that can assist it in
the proof of the step.
In this paper, we describe preliminary work on a new Dafny plugin that
leverages LLMs to assist developers by generating suggestions for relevant
lemmas that Dafny is unable to discover and use. Moreover, for the lemmas that
cannot be proved automatically, the plugin also attempts to provide
accompanying calculational proofs. We also discuss ideas for future work by
describing a research agenda on using LLMs to increase the adoption of
verification-aware languages in general, by increasing developers productivity
and by reducing the level of expertise required for crafting formal
specifications and proving program properties.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Laurel: Generating Dafny Assertions Using Large Language Models [2.6942525604796366]
We propose Laurel, a tool that uses large language models (LLMs) to automatically generate helper assertions for Dafny programs.
Laurel is able to generate over 50% of the required helper assertions given only a few attempts.
arXiv Detail & Related papers (2024-05-27T03:26:01Z) - Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting.
We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - Towards Large Language Models as Copilots for Theorem Proving in Lean [81.94024084598598]
We introduce Lean Copilot, a framework for running Lean inference in large language models.
We build tools for suggesting proof steps, completing intermediate proof goals, and selecting relevant premises.
Experimental results demonstrate the effectiveness of our method in assisting humans and theorem proving process.
arXiv Detail & Related papers (2024-04-18T22:54:08Z) - Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation [2.93322471069531]
We conduct an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT.
Our findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation.
arXiv Detail & Related papers (2024-02-18T20:48:09Z) - MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity.
We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points.
We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z) - Context Matters: Data-Efficient Augmentation of Large Language Models
for Scientific Applications [15.893290942177112]
We explore the challenges inherent to Large Language Models (LLMs) like GPT-4.
The capacity of LLMs to present erroneous answers in a coherent and semantically rigorous manner complicates the detection of factual inaccuracies.
Our work aims to enhance the understanding and mitigation of such errors, thereby contributing to the improvement of LLM accuracy and reliability.
arXiv Detail & Related papers (2023-12-12T08:43:20Z) - FactLLaMA: Optimizing Instruction-Following Language Models with
External Knowledge for Automated Fact-Checking [10.046323978189847]
We propose combining the power of instruction-following language models with external evidence retrieval to enhance fact-checking performance.
Our approach involves leveraging search engines to retrieve relevant evidence for a given input claim.
Then, we instruct-tune an open-sourced language model, called LLaMA, using this evidence, enabling it to predict the veracity of the input claim more accurately.
arXiv Detail & Related papers (2023-09-01T04:14:39Z) - LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean.
Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements.
This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models.
We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.