Related papers: Enabling Unit Proofing for Software Implementation Verification

Enabling Unit Proofing for Software Implementation Verification

URL: http://arxiv.org/abs/2410.14818v1
Date: Fri, 18 Oct 2024 18:37:36 GMT
Title: Enabling Unit Proofing for Software Implementation Verification
Authors: Paschal C. Amusuo, Parth V. Patil, Owen Cochell, Taylor Le Lievre, James C. Davis,
Abstract summary: We propose a research agenda for a unit proofing framework, both methods and tools. This will enable engineers to discover code-level defects early.
Score: 2.7325338323814328
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Formal verification provides mathematical guarantees that a software is correct. Design-level verification tools ensure software specifications are correct, but they do not expose defects in actual implementations. For this purpose, engineers use code-level tools. However, such tools struggle to scale to large software. The process of "Unit Proofing" mitigates this by decomposing the software and verifying each unit independently. We examined AWS's use of unit proofing and observed that current approaches are manual and prone to faults that mask severe defects. We propose a research agenda for a unit proofing framework, both methods and tools, to support software engineers in applying unit proofing effectively and efficiently. This will enable engineers to discover code-level defects early.

Related papers

Do Unit Proofs Work? An Empirical Study of Compositional Bounded Model Checking for Memory Safety Verification [4.826167697662746]
Memory safety defects pose a major threat to software reliability, enabling cyberattacks, outages, and crashes. To mitigate these risks, organizations adopt Compositional Bounded Model Checking (BMC), using unit proofs to formally verify memory safety. However, methods for creating unit proofs vary across organizations and are inconsistent within the same project, leading to errors and missed defects. This work presents the first empirical study on unit proofing for memory safety verification.
arXiv Detail & Related papers (2025-03-17T22:55:12Z)
ToolFuzz -- Automated Agent Tool Testing [5.174808367448261]
ToolFuzz is designed to discover two types of errors: (1) user queries leading to tool runtime errors and (2) user queries that lead to incorrect agent responses. We show that ToolFuzz identifies 20x more erroneous inputs compared to the prompt-engineering approaches.
arXiv Detail & Related papers (2025-03-06T14:29:52Z)
ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models [49.04652315815501]
Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools. We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task.
arXiv Detail & Related papers (2025-02-17T03:42:28Z)
Verifying Non-friendly Formal Verification Designs: Can We Start Earlier? [2.1626093085892144]
We propose an automated methodology based on metamodeling techniques which consist of two main steps. First, an untimed algorithmic description written in C++ is verified in an early stage using generated assertions. Second, this algorithmic description is verified against its sequential design using HLEC and the respective metamodel parameters.
arXiv Detail & Related papers (2024-10-24T06:09:40Z)
Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs. We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z)
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence [64.95492752484171]
We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision.
arXiv Detail & Related papers (2024-02-19T21:45:55Z)
Klever: Verification Framework for Critical Industrial C Programs [0.0]
We present the Klever software verification framework, designed to reduce the effort of applying automatic software verification tools to large and critical industrial C programs. Existing tools do not provide widely adopted means for environment modeling, specification of requirements, verification of many versions and configurations of target programs, and expert assessment of verification results.
arXiv Detail & Related papers (2023-09-28T13:23:59Z)
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z)
Applying Machine Learning Analysis for Software Quality Test [0.0]
It is critical to comprehend what triggers maintenance and if it may be predicted. Numerous methods of assessing the complexity of created programs may produce useful prediction models. In this paper, the machine learning is applied on the available data to calculate the cumulative software failure levels.
arXiv Detail & Related papers (2023-05-16T06:10:54Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Lessons from Formally Verified Deployed Software Systems (Extended version) [65.69802414600832]
This article examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use. It considers the technologies used, the form of verification applied, the results obtained, and the lessons that the software industry should draw regarding its ability to benefit from formal verification techniques and tools.
arXiv Detail & Related papers (2023-01-05T18:18:46Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
A Formally Certified End-to-End Implementation of Shor's Factorization Algorithm [9.349616752756024]
We present the first formally certified end-to-end implementation of Shor's prime factorization algorithm. By leveraging our framework, one can significantly reduce the effects of human errors.
arXiv Detail & Related papers (2022-04-14T17:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.