Does Programming Language Matter? An Empirical Study of Fuzzing Bug Detection
- URL: http://arxiv.org/abs/2602.05312v1
- Date: Thu, 05 Feb 2026 05:08:51 GMT
- Title: Does Programming Language Matter? An Empirical Study of Fuzzing Bug Detection
- Authors: Tatsuya Shirai, Olivier Nourry, Yutaro Kashiwa, Kenji Fujiwara, Hajimu Iida,
- Abstract summary: This study conducts a large-scale cross-language analysis to examine how fuzzing bug characteristics and detection efficiency differ among languages.<n>We analyze 61,444 fuzzing bugs and 999,248 builds from 559 OSS-Fuzz projects categorized by primary language.<n>Our findings reveal that (i) C++ and Rust exhibit higher fuzzing bug detection frequencies, (ii) Rust and Python show low vulnerability ratios but tend to expose more critical vulnerabilities, (iii) crash types vary across languages and unreproducible bugs are more frequent in Go but rare in Rust, and (iv) Python attains higher patch coverage but suffers
- Score: 0.0761187029699472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fuzzing has become a popular technique for automatically detecting vulnerabilities and bugs by generating unexpected inputs. In recent years, the fuzzing process has been integrated into continuous integration workflows (i.e., continuous fuzzing), enabling short and frequent testing cycles. Despite its widespread adoption, prior research has not examined whether the effectiveness of continuous fuzzing varies across programming languages. This study conducts a large-scale cross-language analysis to examine how fuzzing bug characteristics and detection efficiency differ among languages. We analyze 61,444 fuzzing bugs and 999,248 builds from 559 OSS-Fuzz projects categorized by primary language. Our findings reveal that (i) C++ and Rust exhibit higher fuzzing bug detection frequencies, (ii) Rust and Python show low vulnerability ratios but tend to expose more critical vulnerabilities, (iii) crash types vary across languages and unreproducible bugs are more frequent in Go but rare in Rust, and (iv) Python attains higher patch coverage but suffers from longer time-to-detection. These results demonstrate that fuzzing behavior and effectiveness are strongly shaped by language design, providing insights for language-aware fuzzing strategies and tool development.
Related papers
- Rust and Go directed fuzzing with LibAFL-DiFuzz [0.0]
We present a novel approach to directed fuzzing tailored specifically for Rust and Go applications.<n>Our implemented fuzzing tools, based on LibAFL-DiFuzz backend, demonstrate competitive advantages.
arXiv Detail & Related papers (2026-01-30T09:52:50Z) - Discovering 100+ Compiler Defects in 72 Hours via LLM-Driven Semantic Logic Recomposition [15.27741331581011]
We propose FeatureFuzz, a compiler fuzzer that combines features to generate programs.<n>Over 24-hour campaigns, FeatureFuzz uncovered 167 unique crashes, which is 2.78x more than the second-best fuzzer.<n>Through a 72-hour fuzzing campaign, FeatureFuzz identified 113 bugs in GCC and LLVM, 97 of which have already been confirmed by compiler developers.
arXiv Detail & Related papers (2026-01-18T11:33:45Z) - Large-Scale Empirical Analysis of Continuous Fuzzing: Insights from 1 Million Fuzzing Sessions [0.07183613290627339]
This study aims to elucidate the role of continuous fuzzing in vulnerability detection.<n>We collect issue reports, coverage reports, and fuzzing logs from OSS-Fuzz, an online service provided by Google.<n>We reveal that a substantial number of fuzzing bugs exist prior to the integration of continuous fuzzing.
arXiv Detail & Related papers (2025-10-18T10:13:19Z) - Enhancing Code Review through Fuzzing and Likely Invariants [13.727241655311664]
We present FuzzSight, a framework that leverages likely invariants from non-crashing fuzzing inputs to highlight behavioral differences across program versions.<n>In our evaluation, FuzzSight flagged 75% of regression bugs and up to 80% of vulnerabilities uncovered by 24-hour fuzzing.
arXiv Detail & Related papers (2025-10-17T10:30:22Z) - Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study [59.30098850050971]
This work evaluates LLM prompting-based detection across eight non-English languages.<n>We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection.
arXiv Detail & Related papers (2025-05-09T16:00:01Z) - FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks.
FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z) - Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z) - Fuzz4All: Universal Fuzzing with Large Language Models [26.559019384475846]
This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages.<n>We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs.<n>The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers.
arXiv Detail & Related papers (2023-08-09T07:36:21Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Infrared: A Meta Bug Detector [10.541969253100815]
We propose a new approach, called meta bug detection, which offers three crucial advantages over existing learning-based bug detectors.
Our evaluation shows our meta bug detector (MBD) is effective in catching a variety of bugs including null pointer dereference, array index out-of-bound, file handle leak, and even data races in concurrent programs.
arXiv Detail & Related papers (2022-09-18T09:08:51Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.