KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
- URL: http://arxiv.org/abs/2401.00563v1
- Date: Sun, 31 Dec 2023 18:47:33 GMT
- Title: KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
- Authors: Chenyuan Yang, Zijie Zhao, Lingming Zhang
- Abstract summary: We propose KernelGPT, the first approach to automatically inferring Syzkaller specifications via Large Language Models.
Our preliminary results demonstrate that KernelGPT can help Syzkaller achieve higher coverage and find multiple previously unknown bugs.
- Score: 9.860752730040709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bugs in operating system kernels can affect billions of devices and users all
over the world. As a result, a large body of research has been focused on
kernel fuzzing, i.e., automatically generating syscall (system call) sequences
to detect potential kernel bugs or vulnerabilities. Syzkaller, one of the most
widely studied kernel fuzzers, aims to generate valid syscall sequences based
on predefined specifications written in syzlang, a domain-specific language for
defining syscalls, their arguments, and the relationships between them. While
there has been existing work trying to automate Syzkaller specification
generation, this still remains largely manual work and a large number of
important syscalls are still uncovered. In this paper, we propose KernelGPT,
the first approach to automatically inferring Syzkaller specifications via
Large Language Models (LLMs) for enhanced kernel fuzzing. Our basic insight is
that LLMs have seen massive kernel code, documentation, and use cases during
pre-training, and thus can automatically distill the necessary information for
making valid syscalls. More specifically, KernelGPT leverages an iterative
approach to automatically infer all the necessary specification components, and
further leverages the validation feedback to repair/refine the initial
specifications. Our preliminary results demonstrate that KernelGPT can help
Syzkaller achieve higher coverage and find multiple previously unknown bugs.
Moreover, we also received a request from the Syzkaller team to upstream
specifications inferred by KernelGPT.
Related papers
- G-Fuzz: A Directed Fuzzing Framework for gVisor [48.85077340822625]
G-Fuzz is a directed fuzzing framework for gVisor.
G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.
arXiv Detail & Related papers (2024-09-20T01:00:22Z) - Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots [54.55088169443828]
We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots.
Unlike copilot systems, which primarily rely on users to provide essential state information, autopilot systems must complete tasks independently.
To achieve this, an autopilot system should be capable of understanding user intents, actively gathering necessary information from various real-world sources, and making wise decisions.
arXiv Detail & Related papers (2024-09-16T13:39:05Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation.
We show that it is a governing factor leading to performance enhancement.
We also propose a deep learning perspective to increase the representation capacity of spectral truncation kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z) - Optimal Kernel Tuning Parameter Prediction using Deep Sequence Models [0.44998333629984877]
We propose a methodology that uses deep sequence- to-sequence models to predict the optimal tuning parameters governing compute kernels.
The proposed algorithm can achieve more than 90% accuracy on various convolutional kernels in MIOpen, the AMD machine learning primitives library.
arXiv Detail & Related papers (2024-04-15T22:25:54Z) - RLTrace: Synthesizing High-Quality System Call Traces for OS Fuzz Testing [10.644829779197341]
We propose a deep reinforcement learning-based solution, called RLTrace, to synthesize diverse and comprehensive system call traces as the seed to fuzz OS kernels.
During model training, the deep learning model interacts with OS kernels and infers optimal system call traces.
Our evaluation shows that RLTrace outperforms other seed generators by producing more comprehensive system call traces.
arXiv Detail & Related papers (2023-10-04T06:46:00Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Isolation Distributional Kernel: A New Tool for Point & Group Anomaly
Detection [76.1522587605852]
Isolation Distributional Kernel (IDK) is a new way to measure the similarity between two distributions.
We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies.
arXiv Detail & Related papers (2020-09-24T12:25:43Z) - Performance portability through machine learning guided kernel selection
in SYCL libraries [0.0]
General purpose compute libraries must be able to cater to all inputs and parameters provided by a user.
Machine learning methods can be used to mitigate against both of these problems.
tuning the process for new hardware or problems does not require any developer effort or expertise.
arXiv Detail & Related papers (2020-08-30T11:44:37Z) - Towards automated kernel selection in machine learning systems: A SYCL
case study [0.0]
We present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries.
By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
arXiv Detail & Related papers (2020-03-15T11:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.