KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
- URL: http://arxiv.org/abs/2401.00563v2
- Date: Mon, 25 Nov 2024 01:10:44 GMT
- Title: KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
- Authors: Chenyuan Yang, Zijie Zhao, Lingming Zhang,
- Abstract summary: We propose KernelGPT, the first approach to automatically synthesizing syscall specifications via Large Language Models (LLMs)
Our results demonstrate that KernelGPT can generate more new and valid specifications and achieve higher coverage than state-of-the-art techniques.
- Score: 8.77369393651381
- License:
- Abstract: Bugs in operating system kernels can affect billions of devices and users all over the world. As a result, a large body of research has been focused on kernel fuzzing, i.e., automatically generating syscall (system call) sequences to detect potential kernel bugs or vulnerabilities. Kernel fuzzing aims to generate valid syscall sequences guided by syscall specifications that define both the syntax and semantics of syscalls. While there has been existing work trying to automate syscall specification generation, this remains largely manual work, and a large number of important syscalls are still uncovered. In this paper, we propose KernelGPT, the first approach to automatically synthesizing syscall specifications via Large Language Models (LLMs) for enhanced kernel fuzzing. Our key insight is that LLMs have seen massive kernel code, documentation, and use cases during pre-training, and thus can automatically distill the necessary information for making valid syscalls. More specifically, KernelGPT leverages an iterative approach to automatically infer the specifications, and further debug and repair them based on the validation feedback. Our results demonstrate that KernelGPT can generate more new and valid specifications and achieve higher coverage than state-of-the-art techniques. So far, by using newly generated specifications, KernelGPT has already detected 24 new unique bugs in Linux kernel, with 12 fixed and 11 assigned with CVE numbers. Moreover, a number of specifications generated by KernelGPT have already been merged into the kernel fuzzer Syzkaller, following the request from its development team.
Related papers
- G-Fuzz: A Directed Fuzzing Framework for gVisor [48.85077340822625]
G-Fuzz is a directed fuzzing framework for gVisor.
G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.
arXiv Detail & Related papers (2024-09-20T01:00:22Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent [48.791943145735]
We show the potential to reduce Ansor's search time while enhancing kernel quality.
We apply this approach to the first 300 kernels that Ansor generates.
This result has been replicated in 20 well-known deep-learning models.
arXiv Detail & Related papers (2024-06-28T16:34:22Z) - Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation.
We show that it is a governing factor leading to performance enhancement.
We also propose a deep learning perspective to increase the representation capacity of spectral truncation kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z) - Optimal Kernel Tuning Parameter Prediction using Deep Sequence Models [0.44998333629984877]
We propose a methodology that uses deep sequence- to-sequence models to predict the optimal tuning parameters governing compute kernels.
The proposed algorithm can achieve more than 90% accuracy on various convolutional kernels in MIOpen, the AMD machine learning primitives library.
arXiv Detail & Related papers (2024-04-15T22:25:54Z) - RLTrace: Synthesizing High-Quality System Call Traces for OS Fuzz Testing [10.644829779197341]
We propose a deep reinforcement learning-based solution, called RLTrace, to synthesize diverse and comprehensive system call traces as the seed to fuzz OS kernels.
During model training, the deep learning model interacts with OS kernels and infers optimal system call traces.
Our evaluation shows that RLTrace outperforms other seed generators by producing more comprehensive system call traces.
arXiv Detail & Related papers (2023-10-04T06:46:00Z) - model-based script synthesis for fuzzing [10.739464605434977]
Existing approaches fuzz the kernel by modeling syscall sequences from traces or static analysis of system codes.
We propose WinkFuzz, an approach to learn and mutate traced syscall sequences in order to reach different kernel states.
arXiv Detail & Related papers (2023-08-08T08:07:50Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Isolation Distributional Kernel: A New Tool for Point & Group Anomaly
Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions.
We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z) - Towards automated kernel selection in machine learning systems: A SYCL
case study [0.0]
We present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries.
By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
arXiv Detail & Related papers (2020-03-15T11:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.