Related papers: From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

URL: http://arxiv.org/abs/2512.18102v1
Date: Fri, 19 Dec 2025 22:15:53 GMT
Title: From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines
Authors: Kishan Kumar Ganguly, Tim Menzies,
Abstract summary: Exhaustive fuzzing of modern JavaScript engines is infeasible due to the vast number of program states and execution paths.<n>This work introduces feature-guided fuzzing, an automated data-driven approach that replaces coverage with data-directed inference.
Score: 4.282746516699565
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: Exhaustive fuzzing of modern JavaScript engines is infeasible due to the vast number of program states and execution paths. Coverage-guided fuzzers waste effort on low-risk inputs, often ignoring vulnerability-triggering ones that do not increase coverage. Existing heuristics proposed to mitigate this require expert effort, are brittle, and hard to adapt. Objective: We propose a data-centric, LLM-boosted alternative that learns from historical vulnerabilities to automatically identify minimal static (code) and dynamic (runtime) features for detecting high-risk inputs. Method: Guided by historical V8 bugs, iterative prompting generated 115 static and 49 dynamic features, with the latter requiring only five trace flags, minimizing instrumentation cost. After feature selection, 41 features remained to train an XGBoost model to predict high-risk inputs during fuzzing. Results: Combining static and dynamic features yields over 85% precision and under 1% false alarms. Only 25% of these features are needed for comparable performance, showing that most of the search space is irrelevant. Conclusion: This work introduces feature-guided fuzzing, an automated data-driven approach that replaces coverage with data-directed inference, guiding fuzzers toward high-risk states for faster, targeted, and reproducible vulnerability discovery. To support open science, all scripts and data are available at https://github.com/KKGanguly/DataCentricFuzzJS .

Related papers

Rust and Go directed fuzzing with LibAFL-DiFuzz [0.0]
We present a novel approach to directed fuzzing tailored specifically for Rust and Go applications.<n>Our implemented fuzzing tools, based on LibAFL-DiFuzz backend, demonstrate competitive advantages.
arXiv Detail & Related papers (2026-01-30T09:52:50Z)
Enhancing Fuzz Testing Efficiency through Automated Fuzz Target Generation [0.0]
We introduce an approach to improving fuzz target generation through static analysis of library source code.<n>Our findings are demonstrated through the application of this approach to the generation of fuzz targets for C/C++ libraries.
arXiv Detail & Related papers (2026-01-17T09:08:11Z)
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models [50.21378052667732]
We conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics.<n>We propose DiffuGuard, a training-free defense framework that addresses vulnerabilities through a dual-stage approach.
arXiv Detail & Related papers (2025-09-29T05:17:10Z)
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents [52.92354372596197]
Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities.<n>This interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior.<n>We propose a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control and data-level constraints.
arXiv Detail & Related papers (2025-06-13T05:01:09Z)
Poster: Machine Learning for Vulnerability Detection as Target Oracle in Automated Fuzz Driver Generation [0.0]
In vulnerability detection, machine learning has been used as an effective static analysis technique, although it suffers from a significant rate of false positives.<n>We propose an automated fuzz driver generation workflow composed of: (1) identifying a likely vulnerable function by leveraging a machine learning for vulnerability detection model as a target oracle, (2) automatically generating fuzz drivers, and (3) fuzzing the target function to find bugs which could confirm the vulnerability inferred by the target oracle.
arXiv Detail & Related papers (2025-05-02T09:02:36Z)
CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph [29.490817477791357]
We propose an automated fuzz testing method driven by a code knowledge graph and powered by an intelligent agent system.<n>The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity.<n> CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques.
arXiv Detail & Related papers (2024-11-18T12:41:16Z)
Camouflage is all you need: Evaluating and Enhancing Language Model Robustness Against Camouflage Adversarial Attacks [53.87300498478744]
Adversarial attacks represent a substantial challenge in Natural Language Processing (NLP) This study undertakes a systematic exploration of this challenge in two distinct phases: vulnerability evaluation and resilience enhancement. Results suggest a trade-off between performance and robustness, with some models maintaining similar performance while gaining robustness.
arXiv Detail & Related papers (2024-02-15T10:58:22Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
Enhancing Cross-Dataset Performance of Distracted Driving Detection With Score Softmax Classifier And Dynamic Gaussian Smoothing Supervision [6.891556476231427]
Deep neural networks enable real-time monitoring of in-vehicle drivers, facilitating the timely prediction of distractions, fatigue, and potential hazards.<n>Recent research has exposed unreliable cross-dataset driver behavior recognition due to a limited number of data samples and background noise.<n>We propose a Score-Softmax classifier, which reduces the model overconfidence by enhancing category independence.
arXiv Detail & Related papers (2023-10-08T15:28:01Z)
Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection [12.529028629599349]
We propose a novel benchmarking methodology to help researchers better evaluate the true capabilities and limits of ML4VD techniques. Using six ML4VD techniques and two datasets, we find (a) that state-of-the-art models severely overfit to unrelated features for predicting the vulnerabilities in the testing data, and (b) that the performance gained by data augmentation does not generalize beyond the specific augmentations applied during training.
arXiv Detail & Related papers (2023-06-28T08:41:39Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
MINIMAL: Mining Models for Data Free Universal Adversarial Triggers [57.14359126600029]
We present a novel data-free approach, MINIMAL, to mine input-agnostic adversarial triggers from NLP models. We reduce the accuracy of Stanford Sentiment Treebank's positive class from 93.6% to 9.6%. For the Stanford Natural Language Inference (SNLI), our single-word trigger reduces the accuracy of the entailment class from 90.95% to less than 0.6%.
arXiv Detail & Related papers (2021-09-25T17:24:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.