Related papers: Enhancing Code Review through Fuzzing and Likely Invariants

Enhancing Code Review through Fuzzing and Likely Invariants

URL: http://arxiv.org/abs/2510.15512v1
Date: Fri, 17 Oct 2025 10:30:22 GMT
Title: Enhancing Code Review through Fuzzing and Likely Invariants
Authors: Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude,
Abstract summary: We present FuzzSight, a framework that leverages likely invariants from non-crashing fuzzing inputs to highlight behavioral differences across program versions.<n>In our evaluation, FuzzSight flagged 75% of regression bugs and up to 80% of vulnerabilities uncovered by 24-hour fuzzing.
Score: 13.727241655311664
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many software projects employ manual code review to gatekeep defects and vulnerabilities in the code before integration. However, reviewers often work under time pressure and rely primarily on static inspection, leaving the dynamic aspects of the program unexplored. Dynamic analyses could reveal such behaviors, but they are rarely integrated into reviews. Among them, fuzzing is typically applied later to uncover crashing bugs. Yet its ability to exercise code with diverse inputs makes it promising for exposing non-crashing, but unexpected, behaviors earlier. Still, without suitable mechanisms to analyze program behaviors, the rich data produced during fuzzing remains inaccessible to reviewers, limiting its practical value in this context. We hypothesize that unexpected variations in program behaviors could signify potential bugs. The impact of code changes can be automatically captured at runtime. Representing program behavior as likely invariants, dynamic properties consistently observed at specific program points, can provide practical signals of behavioral changes. Such signals offer a way to distinguish between intended changes and unexpected behavioral shifts from code changes. We present FuzzSight, a framework that leverages likely invariants from non-crashing fuzzing inputs to highlight behavioral differences across program versions. By surfacing such differences, it provides insights into which code blocks may need closer attention. In our evaluation, FuzzSight flagged 75% of regression bugs and up to 80% of vulnerabilities uncovered by 24-hour fuzzing. It also outperformed SAST in identifying buggy code blocks, achieving ten times higher detection rates with fewer false alarms. In summary, FuzzSight demonstrates the potential and value of leveraging fuzzing and invariant analysis for early-stage code review, bridging static inspection with dynamic behavioral insights.

Related papers

Following Dragons: Code Review-Guided Fuzzing [7.963548447895452]
EyeQ is a system that leverages developer intelligence from code reviews to guide fuzzing.<n>We first validate EyeQ through a human-guided feasibility study on a security-focused dataset of PHP code reviews.<n>EyeQ significantly improves vulnerability discovery over standard fuzzing configurations.
arXiv Detail & Related papers (2026-02-11T03:46:57Z)
Does Programming Language Matter? An Empirical Study of Fuzzing Bug Detection [0.0761187029699472]
This study conducts a large-scale cross-language analysis to examine how fuzzing bug characteristics and detection efficiency differ among languages.<n>We analyze 61,444 fuzzing bugs and 999,248 builds from 559 OSS-Fuzz projects categorized by primary language.<n>Our findings reveal that (i) C++ and Rust exhibit higher fuzzing bug detection frequencies, (ii) Rust and Python show low vulnerability ratios but tend to expose more critical vulnerabilities, (iii) crash types vary across languages and unreproducible bugs are more frequent in Go but rare in Rust, and (iv) Python attains higher patch coverage but suffers
arXiv Detail & Related papers (2026-02-05T05:08:51Z)
Adapting Language Balance in Code-Switching Speech [60.296574524609575]
Large foundational models still struggle against code-switching test cases.<n>We use differentiable surrogates to mitigate context bias during generation.<n>Experiments with Arabic and Chinese-English showed that the models are able to predict the switching places more correctly.
arXiv Detail & Related papers (2025-10-21T15:23:55Z)
Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset [0.0]
We present ReDef, a high-confidence benchmark of function-level modifications curated from 22 large-scale C/C++ projects.<n>Defective cases are anchored by revert commits, while clean cases are validated through post-hoc history checks.<n>This pipeline yields 3,164 defective and 10,268 clean modifications, offering substantially more reliable labels than prior existing resources.
arXiv Detail & Related papers (2025-09-11T07:07:11Z)
A Comparative Study of Fuzzers and Static Analysis Tools for Finding Memory Unsafety in C and C++ [24.60320701097142]
We present an empirical analysis of five static analyzers and 13 fuzzers, applied to over 100 known security vulnerabilities in C/C++ programs.<n>We find that both techniques discover different types of bugs, but there are clear winners for each.
arXiv Detail & Related papers (2025-05-28T07:22:29Z)
Pipe-Cleaner: Flexible Fuzzing Using Security Policies [0.07499722271664144]
Pipe-Cleaner is a system for detecting and analyzing C code vulnerabilities. It is based on flexible developer-designed security policies enforced by a tag-based runtime reference monitor. We demonstrate the potential of this approach on several heap-related security vulnerabilities.
arXiv Detail & Related papers (2024-10-31T23:35:22Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
FuzzCoder: Byte-level Fuzzing Test via Large Language Model [46.18191648883695]
We propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program.
arXiv Detail & Related papers (2024-09-03T14:40:31Z)
SoK: Prudent Evaluation Practices for Fuzzing [21.113311952857778]
We systematically analyze the evaluation of 150 fuzzing papers published between 2018 and 2023. We study how existing guidelines are implemented and observe potential shortcomings and pitfalls. For example, when investigating reported bugs, we find a surprising disregard of the existing guidelines regarding statistical tests and systematic errors in fuzzing evaluations.
arXiv Detail & Related papers (2024-05-16T16:10:41Z)
Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase [10.961209762486684]
Code revert prediction aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. Previous methods for code defect detection relied on independent features but ignored relationships between code scripts. This paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features.
arXiv Detail & Related papers (2024-03-14T15:54:29Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Detecting Rewards Deterioration in Episodic Reinforcement Learning [63.49923393311052]
In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible. We consider an episodic framework, where the rewards within each episode are not independent, nor identically-distributed, nor Markov. We define the mean-shift in a way corresponding to deterioration of a temporal signal (such as the rewards), and derive a test for this problem with optimal statistical power.
arXiv Detail & Related papers (2020-10-22T12:45:55Z)
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code. We develop a deep-learning approach that learns to correlate a comment with code changes. We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.