Next Steps in LLM-Supported Java Verification
- URL: http://arxiv.org/abs/2502.01573v1
- Date: Mon, 03 Feb 2025 17:55:50 GMT
- Title: Next Steps in LLM-Supported Java Verification
- Authors: Samuel Teuber, Bernhard Beckert,
- Abstract summary: Large Language Models (LLMs) are not only a suitable tool for code generation but also capable of generating annotation-based code specifications.
This paper provides early results on how this rigorous toolset can be used to reliably elicit correct specification annotations from an unreliable LLM.
- Score: 0.8057006406834466
- License:
- Abstract: Recent work has shown that Large Language Models (LLMs) are not only a suitable tool for code generation but also capable of generating annotation-based code specifications. Scaling these methodologies may allow us to deduce provable correctness guarantees for large-scale software systems. In comparison to other LLM tasks, the application field of deductive verification has the notable advantage of providing a rigorous toolset to check LLM-generated solutions. This short paper provides early results on how this rigorous toolset can be used to reliably elicit correct specification annotations from an unreliable LLM oracle.
Related papers
- Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK [0.4143603294943439]
Large language models (LLMs) have demonstrated remarkable code generation capabilities, but the correctness of the generated code cannot be inherently trusted.
This paper explores the feasibility of using formal software verification, specifically the SPARK framework for Ada, to ensure the reliability of LLM-generated code.
We present Marmaragan, a tool that leverages an LLM in order to generate SPARK annotations for existing programs, enabling formal verification of the code.
arXiv Detail & Related papers (2025-02-11T17:42:07Z) - SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
SpecTool is a new benchmark to identify error patterns in LLM output on tool-use tasks.
We show that even the most prominent LLMs exhibit these error patterns in their outputs.
Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks [0.8274693573069442]
This work introduces a methodology to generate and evaluate LaaJ implementations, utilizing an automatically generated benchmark.
The benchmark is used both to develop and validate the LaaJs and to validate and test the LLM code related solution using the LaaJs.
Our approach enables the creation of high quality code task solutions.
arXiv Detail & Related papers (2024-10-28T14:34:36Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - $\forall$uto$\exists$val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks [21.12437562185667]
This paper presents a new approach for scaling LLM assessment in translating formal syntax to natural language.
We use context-free grammars (CFGs) to generate out-of-distribution datasets on the fly.
We also conduct an assessment of several SOTA closed and open-source LLMs to showcase the feasibility and scalability of this paradigm.
arXiv Detail & Related papers (2024-03-27T08:08:00Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - LLMRec: Benchmarking Large Language Models on Recommendation Task [54.48899723591296]
The application of Large Language Models (LLMs) in the recommendation domain has not been thoroughly investigated.
We benchmark several popular off-the-shelf LLMs on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization.
The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation.
arXiv Detail & Related papers (2023-08-23T16:32:54Z) - The potential of LLMs for coding with low-resource and domain-specific
programming languages [0.0]
This study focuses on the econometric scripting language named hansl of the open-source software gretl.
Our findings suggest that LLMs can be a useful tool for writing, understanding, improving, and documenting gretl code.
arXiv Detail & Related papers (2023-07-24T17:17:13Z) - Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking.
This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.