E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification
- URL: http://arxiv.org/abs/2312.08477v1
- Date: Wed, 13 Dec 2023 19:31:00 GMT
- Title: E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification
- Authors: Yu Hao, Weiteng Chen, Ziqiao Zhou, Weidong Cui
- Abstract summary: Large Language Models (LLMs) offer new capabilities for software engineering tasks.
LLMs simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort.
E&V includes a verification process for pseudo-code execution without needing an external oracle.
- Score: 7.745665775992235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Static analysis, the process of examining code without executing it, is
crucial for identifying software issues. Yet, static analysis is hampered by
its complexity and the need for customization for different targets.
Traditional static analysis tools require extensive human effort and are often
limited to specific target programs and programming languages. Recent
advancements in Large Language Models (LLMs), such as GPT-4 and Llama, offer
new capabilities for software engineering tasks. However, their application in
static analysis, especially in understanding complex code structures, remains
under-explored. This paper introduces a novel approach named E&V , which
leverages LLMs to perform static analysis. Specifically, E&V employs LLMs to
simulate the execution of pseudo-code, effectively conducting static analysis
encoded in the pseudo-code with minimal human effort, thereby improving the
accuracy of results. E&V includes a verification process for pseudo-code
execution without needing an external oracle. This process allows E&V to
mitigate hallucinations of LLMs and enhance the accuracy of static analysis
results. We have implemented E&V in a prototype tool designed for triaging
crashes through backward taint analysis. This prototype, paired with GPT-4-32k,
has been applied to triage 170 recently fixed Linux kernel bugs across seven
bug categories. Our experiments demonstrate that the prototype correctly
identifies the blamed function in 81.2% of the cases. Additionally, we observe
that our novel verification process significantly improves the accuracy,
increasing it from 28.2% to 81.2%.
Related papers
- Easing Maintenance of Academic Static Analyzers [0.0]
Mopsa is a static analysis platform that aims at being sound.
This article documents the tools and techniques we have come up with to simplify the maintenance of Mopsa since 2017.
arXiv Detail & Related papers (2024-07-17T11:29:21Z) - Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis [60.23133327001978]
Large language models (LLMs) have exhibited their problem-solving ability in mathematical reasoning.
We propose E-OPT, a benchmark for end-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce Bench, a benchmark that challenges Large Language Models to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Large Language Models (LLMs) are routinely used in retrieval-augmented applications to orchestrate tasks and process inputs from users and other sources.
This opens the door to prompt injection attacks, where the LLM receives and acts upon instructions from supposedly data-only sources, thus deviating from the user's original instructions.
We define this as task drift, and we propose to catch it by scanning and analyzing the LLM's activations.
We show that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions, without being trained on any of these attacks.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Customizing Static Analysis using Codesearch [1.7205106391379021]
A commonly used language to describe a range of static analysis applications is Datalog.
We aim to make building custom static analysis tools much easier for developers, while at the same time providing a familiar framework for application security and static analysis experts.
Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime.
arXiv Detail & Related papers (2024-04-19T09:50:02Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Leveraging Large Language Models for Automated Proof Synthesis in Rust [6.202137610101939]
Large Language Models (LLMs) have shown success in code analysis and synthesis.
We present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus.
Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis.
arXiv Detail & Related papers (2023-11-07T05:47:47Z) - DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks [112.66827096358857]
We introduce DyVal, a protocol for dynamic evaluation of large language models (LLMs)
Based on our framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs.
We evaluate various LLMs ranging from Flan-T5-large to GPT-3.5-Turbo and GPT-4.
arXiv Detail & Related papers (2023-09-29T12:04:14Z) - The Hitchhiker's Guide to Program Analysis: A Journey with Large
Language Models [18.026567399243]
Large Language Models (LLMs) offer a promising alternative to static analysis.
In this paper, we take a deep dive into the open space of LLM-assisted static analysis.
We develop LLift, a fully automated framework that interfaces with both a static analysis tool and an LLM.
arXiv Detail & Related papers (2023-08-01T02:57:43Z) - Malware Classification Using Static Disassembly and Machine Learning [1.5469452301122177]
We propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content, and import libraries, to classify malware families.
Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware.
We show that the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40%.
arXiv Detail & Related papers (2021-12-10T18:14:47Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.