Beyond Imprecise Distance Metrics: LLM-Predicted Target Call Stacks for Directed Greybox Fuzzing
- URL: http://arxiv.org/abs/2510.23101v1
- Date: Mon, 27 Oct 2025 08:17:03 GMT
- Title: Beyond Imprecise Distance Metrics: LLM-Predicted Target Call Stacks for Directed Greybox Fuzzing
- Authors: Yifan Zhang, Xin Zhang,
- Abstract summary: Directed greybox fuzzing (DGF) aims to efficiently trigger bugs at specific target locations.<n>Existing DGF approaches suffer from imprecise probability calculations.<n>We propose to replace static analysis-based distance metrics with precise call stack representations.
- Score: 11.825548125173022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Directed greybox fuzzing (DGF) aims to efficiently trigger bugs at specific target locations by prioritizing seeds whose execution paths are more likely to mutate into triggering target bugs. However, existing DGF approaches suffer from imprecise probability calculations due to their reliance on complex distance metrics derived from static analysis. The over-approximations inherent in static analysis cause a large number of irrelevant execution paths to be mistakenly considered to potentially mutate into triggering target bugs, significantly reducing fuzzing efficiency. We propose to replace static analysis-based distance metrics with precise call stack representations. Call stacks represent precise control flows, thereby avoiding false information in static analysis. We leverage large language models (LLMs) to predict vulnerability-triggering call stacks for guiding seed prioritization. Our approach constructs call graphs through static analysis to identify methods that can potentially reach target locations, then utilizes LLMs to predict the most likely call stack sequence that triggers the vulnerability. Seeds whose execution paths have higher overlap with the predicted call stack are prioritized for mutation. This is the first work to integrate LLMs into the core seed prioritization mechanism of DGF. We implement our approach and evaluate it against several state-of-the-art fuzzers. On a suite of real-world programs, our approach triggers vulnerabilities $1.86\times$ to $3.09\times$ faster compared to baselines. In addition, our approach identifies 10 new vulnerabilities and 2 incomplete fixes in the latest versions of programs used in our controlled experiments through directed patch testing, with 10 assigned CVE IDs.
Related papers
- Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z) - Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback [0.0]
I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback.<n>Our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers.
arXiv Detail & Related papers (2025-11-06T02:38:24Z) - RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment [0.0]
We introduce RePaCA, a novel static APCA technique that leverages Large Language Models (LLMs) specialized in thinking tasks.<n>Our approach achieves state-of-the-art performance, with 83.1% accuracy and an 84.8% F1-score.
arXiv Detail & Related papers (2025-07-30T11:21:09Z) - Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks [14.474710100574866]
We introduce CupidCall, a novel approach for resolving indirect calls using graph neural networks.<n>We leverage advanced compiler-level type analysis to generate high-quality callsite-callee training pairs.<n>We further design a graph neural model that leverages augmented CFGs and relational graph convolutions for accurate target prediction.
arXiv Detail & Related papers (2025-07-24T20:54:41Z) - The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs [17.497629884237647]
BugLens is a post-refinement framework that significantly enhances static analysis precision for bug detection.<n>LLMs demonstrate promising code understanding capabilities, but their direct application to program analysis remains unreliable.<n>LLMs guide LLMs through structured reasoning steps to assess security impact and validate constraints from the source code.
arXiv Detail & Related papers (2025-04-16T02:17:06Z) - KNighter: Transforming Static Analysis with LLM-Synthesized Checkers [20.368621145118468]
KNighter generates high-precision checkers capable of detecting diverse bug patterns.<n>To date, KNighter-synthesized checkers have discovered 92 new, critical, long-latent bugs in the Linux kernel.
arXiv Detail & Related papers (2025-03-12T02:30:19Z) - Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial.
Our research investigates the fragility of uncertainty estimation and explores potential attacks.
We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z) - Combined Static Analysis and Machine Learning Prediction for Application Debloating [2.010931857032585]
We develop a framework, Predictive Debloat with Static Guarantees (PDSG)
PDSG predicts the dynamic callee set emanating from a callsite, and to resolve mispredictions, it employs a lightweight audit based on static invariants of call chains.
It achieves the highest gadget reductions among similar techniques on SPEC CPU 2017, reducing 82.5% of the total gadgets on average.
arXiv Detail & Related papers (2024-03-30T00:14:17Z) - Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z) - A Large-scale Multiple-objective Method for Black-box Attack against
Object Detection [70.00150794625053]
We propose to minimize the true positive rate and maximize the false positive rate, which can encourage more false positive objects to block the generation of new true positive bounding boxes.
We extend the standard Genetic Algorithm with Random Subset selection and Divide-and-Conquer, called GARSDC, which significantly improves the efficiency.
Compared with the state-of-art attack methods, GARSDC decreases by an average 12.0 in the mAP and queries by about 1000 times in extensive experiments.
arXiv Detail & Related papers (2022-09-16T08:36:42Z) - Continual Test-Time Domain Adaptation [94.51284735268597]
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data.
CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models.
arXiv Detail & Related papers (2022-03-25T11:42:02Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.