KNighter: Transforming Static Analysis with LLM-Synthesized Checkers
- URL: http://arxiv.org/abs/2503.09002v1
- Date: Wed, 12 Mar 2025 02:30:19 GMT
- Title: KNighter: Transforming Static Analysis with LLM-Synthesized Checkers
- Authors: Chenyuan Yang, Zijie Zhao, Zichen Xie, Haoyu Li, Lingming Zhang,
- Abstract summary: KNighter generates specialized static analyzers guided by historical patch knowledge.<n>To date, KNighter-synthesized checkers have discovered 70 new bugs/vulnerabilities in the Linux kernel.
- Score: 14.02595288424478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Static analysis is a powerful technique for bug detection in critical systems like operating system kernels. However, designing and implementing static analyzers is challenging, time-consuming, and typically limited to predefined bug patterns. While large language models (LLMs) have shown promise for static analysis, directly applying them to scan large codebases remains impractical due to computational constraints and contextual limitations. We present KNighter, the first approach that unlocks practical LLM-based static analysis by automatically synthesizing static analyzers from historical bug patterns. Rather than using LLMs to directly analyze massive codebases, our key insight is leveraging LLMs to generate specialized static analyzers guided by historical patch knowledge. KNighter implements this vision through a multi-stage synthesis pipeline that validates checker correctness against original patches and employs an automated refinement process to iteratively reduce false positives. Our evaluation on the Linux kernel demonstrates that KNighter generates high-precision checkers capable of detecting diverse bug patterns overlooked by existing human-written analyzers. To date, KNighter-synthesized checkers have discovered 70 new bugs/vulnerabilities in the Linux kernel, with 56 confirmed and 41 already fixed. 11 of these findings have been assigned CVE numbers. This work establishes an entirely new paradigm for scalable, reliable, and traceable LLM-based static analysis for real-world systems via checker synthesis.
Related papers
- The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs [17.497629884237647]
BugLens is a post-refinement framework that significantly improves static analysis precision.
It raises precision from 0.10 (raw) and 0.50 (semi-automated refinement) to 0.72, substantially reducing false positives.
Our results suggest that a structured LLM-based workflow can meaningfully enhance the effectiveness of static analysis tools.
arXiv Detail & Related papers (2025-04-16T02:17:06Z) - Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [60.881609323604685]
Large Language Models (LLMs) accessed via black-box APIs introduce a trust challenge.
Users pay for services based on advertised model capabilities.
providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs.
This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking.
arXiv Detail & Related papers (2025-04-07T03:57:41Z) - Learning on LLM Output Signatures for gray-box LLM Behavior Analysis [52.81120759532526]
Large Language Models (LLMs) have achieved widespread adoption, yet our understanding of their behavior remains limited.
We develop a transformer-based approach to process that theoretically guarantees approximation of existing techniques.
Our approach achieves superior performance on hallucination and data contamination detection in gray-box settings.
arXiv Detail & Related papers (2025-03-18T09:04:37Z) - LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis [13.993290878789779]
We propose a compositional neuro-symbolic approach for compilation-free, customizable static analysis with reduced hallucinations.
It attains 66.27% precision and 78.57% recall in taint vulnerability detection, surpassing an industrial approach in F1 score by 0.20.
arXiv Detail & Related papers (2024-12-18T23:14:59Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
We show that editing a small subset of parameters can effectively modulate specific behaviors of large language models (LLMs)<n>Our approach achieves reductions of up to 90.0% in toxicity on the RealToxicityPrompts dataset and 49.2% on ToxiGen.
arXiv Detail & Related papers (2024-07-11T17:52:03Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification [7.745665775992235]
Large Language Models (LLMs) offer new capabilities for software engineering tasks.
LLMs simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort.
E&V includes a verification process for pseudo-code execution without needing an external oracle.
arXiv Detail & Related papers (2023-12-13T19:31:00Z) - Leveraging Large Language Models for Automated Proof Synthesis in Rust [6.202137610101939]
Large Language Models (LLMs) have shown success in code analysis and synthesis.
We present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus.
Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis.
arXiv Detail & Related papers (2023-11-07T05:47:47Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - The Hitchhiker's Guide to Program Analysis: A Journey with Large
Language Models [18.026567399243]
Large Language Models (LLMs) offer a promising alternative to static analysis.
In this paper, we take a deep dive into the open space of LLM-assisted static analysis.
We develop LLift, a fully automated framework that interfaces with both a static analysis tool and an LLM.
arXiv Detail & Related papers (2023-08-01T02:57:43Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.