Related papers: Following Dragons: Code Review-Guided Fuzzing

Following Dragons: Code Review-Guided Fuzzing

URL: http://arxiv.org/abs/2602.10487v1
Date: Wed, 11 Feb 2026 03:46:57 GMT
Title: Following Dragons: Code Review-Guided Fuzzing
Authors: Viet Hoang Luu, Amirmohammad Pasdar, Wachiraphan Charoenwet, Toby Murray, Shaanan Cohney, Van-Thuan Pham,
Abstract summary: EyeQ is a system that leverages developer intelligence from code reviews to guide fuzzing.<n>We first validate EyeQ through a human-guided feasibility study on a security-focused dataset of PHP code reviews.<n>EyeQ significantly improves vulnerability discovery over standard fuzzing configurations.
Score: 7.963548447895452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern fuzzers scale to large, real-world software but often fail to exercise the program states developers consider most fragile or security-critical. Such states are typically deep in the execution space, gated by preconditions, or overshadowed by lower-value paths that consume limited fuzzing budgets. Meanwhile, developers routinely surface risk-relevant insights during code review, yet this information is largely ignored by automated testing tools. We present EyeQ, a system that leverages developer intelligence from code reviews to guide fuzzing. EyeQ extracts security-relevant signals from review discussions, localizes the implicated program regions, and translates these insights into annotation-based guidance for fuzzing. The approach operates atop existing annotation-aware fuzzing, requiring no changes to program semantics or developer workflows. We first validate EyeQ through a human-guided feasibility study on a security-focused dataset of PHP code reviews, establishing a strong baseline for review-guided fuzzing. We then automate the workflow using a large language model with carefully designed prompts. EyeQ significantly improves vulnerability discovery over standard fuzzing configurations, uncovering more than 40 previously unknown bugs in the security-critical PHP codebase.

Related papers

A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI [54.34738767990601]
As Large Language Models become increasingly integrated into software engineering tasks, understanding and mitigating hallucination in code becomes essential.<n>We provide a systematic review of hallucination phenomena in code-oriented LLMs from four key perspectives.
arXiv Detail & Related papers (2025-11-02T02:58:41Z)
Enhancing Code Review through Fuzzing and Likely Invariants [13.727241655311664]
We present FuzzSight, a framework that leverages likely invariants from non-crashing fuzzing inputs to highlight behavioral differences across program versions.<n>In our evaluation, FuzzSight flagged 75% of regression bugs and up to 80% of vulnerabilities uncovered by 24-hour fuzzing.
arXiv Detail & Related papers (2025-10-17T10:30:22Z)
Evaluating Language Model Reasoning about Confidential Information [95.64687778185703]
We study whether language models exhibit contextual robustness, or the capability to adhere to context-dependent safety specifications.<n>We develop a benchmark (PasswordEval) that measures whether language models can correctly determine when a user request is authorized.<n>We find that current open- and closed-source models struggle with this seemingly simple task, and that, perhaps surprisingly, reasoning capabilities do not generally improve performance.
arXiv Detail & Related papers (2025-08-27T15:39:46Z)
UQ: Assessing Language Models on Unsolved Questions [149.46593270027697]
We introduce UQ, a testbed of 500 challenging, diverse questions sourced from Stack Exchange.<n>UQ is difficult and realistic by construction: unsolved questions are often hard and naturally arise when humans seek answers.<n>The top model passes UQ-validation on only 15% of questions, and preliminary human verification has already identified correct answers.
arXiv Detail & Related papers (2025-08-25T01:07:59Z)
Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs [129.79394562739705]
Large language models (LLMs) exhibit impressive fluency, but often produce critical errors known as "hallucinations"<n>We propose RAUQ (Recurrent Attention-based Uncertainty Quantification), an unsupervised approach that leverages intrinsic attention patterns in transformers to detect hallucinations efficiently.<n> Experiments across 4 LLMs and 12 question answering, summarization, and translation tasks demonstrate that RAUQ yields excellent results.
arXiv Detail & Related papers (2025-05-26T14:28:37Z)
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs [71.97006967209539]
Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information.<n>Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs.<n>We pre-train a collection of UQ heads for popular LLM series, including Mistral, Llama, and Gemma 2.
arXiv Detail & Related papers (2025-05-13T03:30:26Z)
CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph [29.490817477791357]
We propose an automated fuzz testing method driven by a code knowledge graph and powered by an intelligent agent system.<n>The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity.<n> CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques.
arXiv Detail & Related papers (2024-11-18T12:41:16Z)
Fixing Security Vulnerabilities with AI in OSS-Fuzz [9.730566646484304]
OSS-Fuzz is the most significant and widely used infrastructure for continuous validation of open source systems. We customise the well-known AutoCodeRover agent for fixing security vulnerabilities. Our experience with OSS-Fuzz vulnerability data shows that LLM agent autonomy is useful for successful security patching.
arXiv Detail & Related papers (2024-11-03T16:20:32Z)
Pipe-Cleaner: Flexible Fuzzing Using Security Policies [0.07499722271664144]
Pipe-Cleaner is a system for detecting and analyzing C code vulnerabilities. It is based on flexible developer-designed security policies enforced by a tag-based runtime reference monitor. We demonstrate the potential of this approach on several heap-related security vulnerabilities.
arXiv Detail & Related papers (2024-10-31T23:35:22Z)
PrescientFuzz: A more effective exploration approach for grey-box fuzzing [0.45053464397400894]
We produce an augmented version of LibAFL's fuzzbench' fuzzer, called PrescientFuzz, that makes use of semantic information from the target program's control flow graph (CFG)<n>We develop an input corpus scheduler that prioritises the selection of inputs for mutation based on the proximity of their execution path to uncovered edges.
arXiv Detail & Related papers (2024-04-29T17:21:18Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.