Selene: Pioneering Automated Proof in Software Verification
- URL: http://arxiv.org/abs/2401.07663v2
- Date: Wed, 5 Jun 2024 08:16:12 GMT
- Title: Selene: Pioneering Automated Proof in Software Verification
- Authors: Lichen Zhang, Shuai Lu, Nan Duan,
- Abstract summary: We introduce Selene, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4.
Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation.
- Score: 62.09555413263788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-to-end proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors.
Related papers
- AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements [1.4433703131122861]
Large language models (LLMs) have shown potential across various domains, including cybersecurity.
There is currently no comprehensive, open, end-to-end automated penetration testing benchmark.
This paper introduces a novel open benchmark for LLM-based automated penetration testing.
arXiv Detail & Related papers (2024-10-22T16:18:41Z) - Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a novel framework that overcomes the lack of human-written proof to enable automated proof generation of Rust code.
We demonstrate superior efficiency and precision compared to GPT-4o.
This advancement leads to a significant improvement in performance, achieving a 70.50% accuracy rate in a benchmark crafted by human experts.
arXiv Detail & Related papers (2024-10-21T08:15:45Z) - Test Oracle Automation in the era of LLMs [52.69509240442899]
Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks.
This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
arXiv Detail & Related papers (2024-05-21T13:19:10Z) - ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation [10.503097140635374]
ChIRAAG, based on OpenAI GPT4, generates System Verilog Assertion (SVA) from natural language specifications of a design.
In experiments, only 27% of LLM-generated raw assertions had errors, which was rectified in few iterations.
Our results show that LLMs can streamline and assist engineers in the assertion generation process, reshaping verification.
arXiv Detail & Related papers (2024-01-31T12:41:27Z) - A Preliminary Study on Using Large Language Models in Software
Pentesting [2.0551676463612636]
Large language models (LLM) are perceived to offer promising potentials for automating security tasks.
We investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code.
arXiv Detail & Related papers (2024-01-30T21:42:59Z) - Towards Generating Executable Metamorphic Relations Using Large Language Models [46.26208489175692]
We propose an approach for automatically deriving executable MRs from requirements using large language models (LLMs)
To assess the feasibility of our approach, we conducted a questionnaire-based survey in collaboration with Siemens Industry Software.
arXiv Detail & Related papers (2024-01-30T13:52:47Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.