E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification
- URL: http://arxiv.org/abs/2312.08477v1
- Date: Wed, 13 Dec 2023 19:31:00 GMT
- Title: E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification
- Authors: Yu Hao, Weiteng Chen, Ziqiao Zhou, Weidong Cui
- Abstract summary: Large Language Models (LLMs) offer new capabilities for software engineering tasks.
LLMs simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort.
E&V includes a verification process for pseudo-code execution without needing an external oracle.
- Score: 7.745665775992235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Static analysis, the process of examining code without executing it, is
crucial for identifying software issues. Yet, static analysis is hampered by
its complexity and the need for customization for different targets.
Traditional static analysis tools require extensive human effort and are often
limited to specific target programs and programming languages. Recent
advancements in Large Language Models (LLMs), such as GPT-4 and Llama, offer
new capabilities for software engineering tasks. However, their application in
static analysis, especially in understanding complex code structures, remains
under-explored. This paper introduces a novel approach named E&V , which
leverages LLMs to perform static analysis. Specifically, E&V employs LLMs to
simulate the execution of pseudo-code, effectively conducting static analysis
encoded in the pseudo-code with minimal human effort, thereby improving the
accuracy of results. E&V includes a verification process for pseudo-code
execution without needing an external oracle. This process allows E&V to
mitigate hallucinations of LLMs and enhance the accuracy of static analysis
results. We have implemented E&V in a prototype tool designed for triaging
crashes through backward taint analysis. This prototype, paired with GPT-4-32k,
has been applied to triage 170 recently fixed Linux kernel bugs across seven
bug categories. Our experiments demonstrate that the prototype correctly
identifies the blamed function in 81.2% of the cases. Additionally, we observe
that our novel verification process significantly improves the accuracy,
increasing it from 28.2% to 81.2%.
Related papers
- SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
SpecTool is a new benchmark to identify error patterns in LLM output on tool-use tasks.
We show that even the most prominent LLMs exhibit these error patterns in their outputs.
Researchers can use the analysis and insights from SPECTOOL to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - Easing Maintenance of Academic Static Analyzers [0.0]
Mopsa is a static analysis platform that aims at being sound.
This article documents the tools and techniques we have come up with to simplify the maintenance of Mopsa since 2017.
arXiv Detail & Related papers (2024-07-17T11:29:21Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - Customizing Static Analysis using Codesearch [1.7205106391379021]
A commonly used language to describe a range of static analysis applications is Datalog.
We aim to make building custom static analysis tools much easier for developers, while at the same time providing a familiar framework for application security and static analysis experts.
Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime.
arXiv Detail & Related papers (2024-04-19T09:50:02Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Leveraging Large Language Models for Automated Proof Synthesis in Rust [6.202137610101939]
Large Language Models (LLMs) have shown success in code analysis and synthesis.
We present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus.
Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis.
arXiv Detail & Related papers (2023-11-07T05:47:47Z) - DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks [112.66827096358857]
We introduce DyVal, a protocol for dynamic evaluation of large language models (LLMs)
Based on our framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs.
We evaluate various LLMs ranging from Flan-T5-large to GPT-3.5-Turbo and GPT-4.
arXiv Detail & Related papers (2023-09-29T12:04:14Z) - The Hitchhiker's Guide to Program Analysis: A Journey with Large
Language Models [18.026567399243]
Large Language Models (LLMs) offer a promising alternative to static analysis.
In this paper, we take a deep dive into the open space of LLM-assisted static analysis.
We develop LLift, a fully automated framework that interfaces with both a static analysis tool and an LLM.
arXiv Detail & Related papers (2023-08-01T02:57:43Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Malware Classification Using Static Disassembly and Machine Learning [1.5469452301122177]
We propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content, and import libraries, to classify malware families.
Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware.
We show that the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40%.
arXiv Detail & Related papers (2021-12-10T18:14:47Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.