model-based script synthesis for fuzzing
- URL: http://arxiv.org/abs/2308.04115v1
- Date: Tue, 8 Aug 2023 08:07:50 GMT
- Title: model-based script synthesis for fuzzing
- Authors: Zian Liu, Chao Chen, Muhammad Ejaz Ahmed, Jun Zhang, Dongxi Liu
- Abstract summary: Existing approaches fuzz the kernel by modeling syscall sequences from traces or static analysis of system codes.
We propose WinkFuzz, an approach to learn and mutate traced syscall sequences in order to reach different kernel states.
- Score: 10.739464605434977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel fuzzing is important for finding critical kernel vulnerabilities.
Close-source (e.g., Windows) operating system kernel fuzzing is even more
challenging due to the lack of source code. Existing approaches fuzz the kernel
by modeling syscall sequences from traces or static analysis of system codes.
However, a common limitation is that they do not learn and mutate the syscall
sequences to reach different kernel states, which can potentially result in
more bugs or crashes.
In this paper, we propose WinkFuzz, an approach to learn and mutate traced
syscall sequences in order to reach different kernel states. WinkFuzz learns
syscall dependencies from the trace, identifies potential syscalls in the trace
that can have dependent subsequent syscalls, and applies the dependencies to
insert more syscalls while preserving the dependencies into the trace. Then
WinkFuzz fuzzes the synthesized new syscall sequence to find system crashes.
We applied WinkFuzz to four seed applications and found a total increase in
syscall number of 70.8\%, with a success rate of 61\%, within three insert
levels. The average time for tracing, dependency analysis, recovering model
script, and synthesizing script was 600, 39, 34, and 129 seconds respectively.
The instant fuzzing rate is 3742 syscall executions per second. However, the
average fuzz efficiency dropped to 155 syscall executions per second when the
initializing time, waiting time, and other factors were taken into account. We
fuzzed each seed application for 24 seconds and, on average, obtained 12.25
crashes within that time frame.
Related papers
- KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent [48.791943145735]
We show the potential to reduce Ansor's search time while enhancing kernel quality.
We apply this approach to the first 300 kernels that Ansor generates.
This result has been replicated in 20 well-known deep-learning models.
arXiv Detail & Related papers (2024-06-28T16:34:22Z) - Making 'syscall' a Privilege not a Right [4.674007120771649]
nexpoline is a secure syscall interception mechanism combining Memory Protection Keys (MPK) and Seccomp or Syscall User Dispatch (SUD)
It offers better efficiency than secure interception techniques like ptrace, as nexpoline can intercept syscalls through binary rewriting securely.
Notably, it operates without kernel modifications, making it viable on current Linux systems without needing root privileges.
arXiv Detail & Related papers (2024-06-11T16:33:56Z) - RelayAttention for Efficient Large Language Model Serving with Long System Prompts [59.50256661158862]
This paper aims to improve the efficiency of LLM services that involve long system prompts.
handling these system prompts requires heavily redundant memory accesses in existing causal attention algorithms.
We propose RelayAttention, an attention algorithm that allows reading hidden states from DRAM exactly once for a batch of input tokens.
arXiv Detail & Related papers (2024-02-22T18:58:28Z) - KernelGPT: Enhanced Kernel Fuzzing via Large Language Models [8.77369393651381]
We propose KernelGPT, the first approach to automatically synthesizing syscall specifications via Large Language Models (LLMs)
Our results demonstrate that KernelGPT can generate more new and valid specifications and achieve higher coverage than state-of-the-art techniques.
arXiv Detail & Related papers (2023-12-31T18:47:33Z) - DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training [82.06732962485754]
FlashAttention effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.
We introduce DISTFLASHATTN, a memory-efficient attention mechanism optimized for long-context LLMs training.
It achieves 1.67x and 1.26 - 1.88x speedup compared to recent Ring Attention and DeepSpeed-Ulysses.
arXiv Detail & Related papers (2023-10-05T03:47:57Z) - RLTrace: Synthesizing High-Quality System Call Traces for OS Fuzz Testing [10.644829779197341]
We propose a deep reinforcement learning-based solution, called RLTrace, to synthesize diverse and comprehensive system call traces as the seed to fuzz OS kernels.
During model training, the deep learning model interacts with OS kernels and infers optimal system call traces.
Our evaluation shows that RLTrace outperforms other seed generators by producing more comprehensive system call traces.
arXiv Detail & Related papers (2023-10-04T06:46:00Z) - SYSPART: Automated Temporal System Call Filtering for Binaries [4.445982681030902]
Recent approaches automatically identify the system calls required by programs to block unneeded ones.
SYSPART is an automatic system-call filtering system designed for binary-only server programs.
arXiv Detail & Related papers (2023-09-10T23:57:07Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Latency-Aware Differentiable Neural Architecture Search [113.35689580508343]
Differentiable neural architecture search methods became popular in recent years, mainly due to their low search costs and flexibility in designing the search space.
However, these methods suffer the difficulty in optimizing network, so that the searched network is often unfriendly to hardware.
This paper deals with this problem by adding a differentiable latency loss term into optimization, so that the search process can tradeoff between accuracy and latency with a balancing coefficient.
arXiv Detail & Related papers (2020-01-17T15:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.