Related papers: Leveraging Large Language Models to Boost Dafny's Developers Productivity

Leveraging Large Language Models to Boost Dafny's Developers Productivity

URL: http://arxiv.org/abs/2401.00963v1
Date: Mon, 1 Jan 2024 21:58:13 GMT
Title: Leveraging Large Language Models to Boost Dafny's Developers Productivity
Authors: \'Alvaro Silva, Alexandra Mendes, Jo\~ao F. Ferreira
Abstract summary: This paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers. A new Dafny plugin generates suggestions for relevant lemmas that Dafny is unable to discover and use. For the lemmas that cannot be proved automatically, the plugin also attempts to provide accompanying calculational proofs.
Score: 49.64902130083662
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This research idea paper proposes leveraging Large Language Models (LLMs) to enhance the productivity of Dafny developers. Although the use of verification-aware languages, such as Dafny, has increased considerably in the last decade, these are still not widely adopted. Often the cost of using such languages is too high, due to the level of expertise required from the developers and challenges that they often face when trying to prove a program correct. Even though Dafny automates a lot of the verification process, sometimes there are steps that are too complex for Dafny to perform on its own. One such case is that of missing lemmas, i.e. Dafny is unable to prove a result without being given further help in the form of a theorem that can assist it in the proof of the step. In this paper, we describe preliminary work on a new Dafny plugin that leverages LLMs to assist developers by generating suggestions for relevant lemmas that Dafny is unable to discover and use. Moreover, for the lemmas that cannot be proved automatically, the plugin also attempts to provide accompanying calculational proofs. We also discuss ideas for future work by describing a research agenda on using LLMs to increase the adoption of verification-aware languages in general, by increasing developers productivity and by reducing the level of expertise required for crafting formal specifications and proving program properties.

Related papers

Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning [79.48839334040197]
Instruction Fine-tuning (IFT) can enhance the helpfulness of Large Language Models (LLMs) IFT steers LLMs to generate responses with long-tail knowledge that is not well covered during pre-training, leading to more informative but less truthful answers when generalizing to unseen tasks. We propose $textbfUNIT$, a novel IFT paradigm to address this trade-off.
arXiv Detail & Related papers (2025-02-17T16:10:30Z)
Dafny as Verification-Aware Intermediate Language for Code Generation [0.0]
Large language models (LLMs) generate source code from natural language prompts. One of its limitations is that the generated code can be faulty at times, despite being presented to the user as correct. We propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny. The correct Dafny program is then compiled to the target language and returned to the user.
arXiv Detail & Related papers (2025-01-10T17:23:14Z)
Large Language Models Can Self-Improve in Long-context Reasoning [100.52886241070907]
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. We propose ours, an approach specifically designed for this purpose. ours achieves superior performance compared to prior approaches that depend on data produced by human experts or advanced models.
arXiv Detail & Related papers (2024-11-12T19:53:00Z)
dafny-annotator: AI-Assisted Verification of Dafny Programs [4.651620941143133]
We explore using a combination of Large Language Models and search to build dafny-annotator. On a test set from the DafnyBench collection of programs, greedy search guided by LLaMa 3.1 8B successfully annotates only 15.7% of the methods. Our results suggest a path towards capable AI assistants for languages that don't yet have large-scale human-generated examples.
arXiv Detail & Related papers (2024-11-05T19:27:56Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages [6.0608817611709735]
We propose a metric for evaluating the quality of specifications for verification-aware languages. We show that our metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark. We also outline formal verification challenges that need to be addressed to apply the technique more widely.
arXiv Detail & Related papers (2024-06-14T06:52:08Z)
Laurel: Generating Dafny Assertions Using Large Language Models [2.6942525604796366]
We propose Laurel, a tool that uses large language models (LLMs) to automatically generate helper assertions for Dafny programs. Laurel is able to generate over 50% of the required helper assertions given only a few attempts.
arXiv Detail & Related papers (2024-05-27T03:26:01Z)
Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting. We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z)
Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z)
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data [85.50740598523818]
MUSTARD is a framework that masters uniform synthesis of theorem and proof data of high quality and diversity. We present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data.
arXiv Detail & Related papers (2024-02-14T05:57:58Z)
FactLLaMA: Optimizing Instruction-Following Language Models with External Knowledge for Automated Fact-Checking [10.046323978189847]
We propose combining the power of instruction-following language models with external evidence retrieval to enhance fact-checking performance. Our approach involves leveraging search engines to retrieve relevant evidence for a given input claim. Then, we instruct-tune an open-sourced language model, called LLaMA, using this evidence, enabling it to predict the veracity of the input claim more accurately.
arXiv Detail & Related papers (2023-09-01T04:14:39Z)
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models [72.54339382005732]
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. Existing methods are difficult to reproduce or build on, due to private code, data, and compute requirements. This paper introduces LeanDojo: an open-source Lean toolkit consisting of toolkits, data, models. We develop ReProver: an LLM-based prover augmented with retrieval for selecting premises from a vast math library.
arXiv Detail & Related papers (2023-06-27T17:05:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.