Automated Code Repair for C/C++ Static Analysis Alerts
- URL: http://arxiv.org/abs/2508.02820v1
- Date: Mon, 04 Aug 2025 18:44:50 GMT
- Title: Automated Code Repair for C/C++ Static Analysis Alerts
- Authors: David Svoboda, Lori Flynn, William Klieber, Michael Duggan, Nicholas Reimer, Joseph Sible,
- Abstract summary: Static analysis (SA) tools produce many diagnostic alerts indicating that source code in C or C++ may be defective.<n>This paper details the application of design, development, and performance testing to an APR tool we built that repairs C/C++ code associated with 3 categories of alerts produced by multiple SA tools.
- Score: 1.260797434681533
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: (Note: This work is a preprint.) Static analysis (SA) tools produce many diagnostic alerts indicating that source code in C or C++ may be defective and potentially vulnerable to security exploits. Many of these alerts are false positives. Identifying the true-positive alerts and repairing the defects in the associated code are huge efforts that automated program repair (APR) tools can help with. Our experience showed us that APR can reduce the number of SA alerts significantly and reduce the manual effort of analysts to review code. This engineering experience paper details the application of design, development, and performance testing to an APR tool we built that repairs C/C++ code associated with 3 categories of alerts produced by multiple SA tools. Its repairs are simple and local. Furthermore, our findings convinced the maintainers of the CERT Coding Standards to re-assess and update the metrics used to assess when violations of guidelines are detectable or repairable. We discuss engineering design choices made to support goals of trustworthiness and acceptability to developers. Our APR tool repaired 8718 out of 9234 alerts produced by one SA tool on one codebase. It can repair 3 flaw categories. For 2 flaw categories, 2 SA tools, and 2 codebases, our tool repaired or dismissed as false positives over 80% of alerts, on average. Tests showed repairs did not appreciably degrade the performance of the code or cause new alerts to appear (with the possible exception of sqlite3.c). This paper describes unique contributions that include a new empirical analysis of SA data, our selection method for flaw categories to repair, publication of our APR tool, and a dataset of SA alerts from open-source SA tools run on open-source codebases. It discusses positive and negative results and lessons learned.
Related papers
- A Large-Scale Collection Of (Non-)Actionable Static Code Analysis Reports [0.05599792629509228]
Static Code Analysis (SCA) tools often generate an overwhelming number of warnings, many of which are non-actionable.<n>This overload of alerts leads to alert fatigue'', a phenomenon where developers become desensitized to warnings, potentially overlooking critical issues and ultimately hindering productivity and code quality.<n>We introduce a novel methodology for collecting and categorizing SCA warnings, effectively distinguishing actionable from non-actionable ones.<n>We generate a large-scale dataset of over 1 million entries of Java source code warnings, named NASCAR: (Non-)Actionable Static Code Analysis Reports.
arXiv Detail & Related papers (2025-11-13T13:59:21Z) - On the need to perform comprehensive evaluations of automated program repair benchmarks: Sorald case study [4.968268396950843]
Automated program repair (APR) tools aim to improve code quality by automatically addressing violations detected by static analysis profilers.<n>Previous research tends to evaluate APR tools only for their ability to clear violations.<n>This study evaluates Sorald, a state-of-the-art APR tool, as a proof of concept.
arXiv Detail & Related papers (2025-08-21T00:12:14Z) - Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z) - Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z) - ToolFuzz -- Automated Agent Tool Testing [5.174808367448261]
ToolFuzz is designed to discover two types of errors: (1) user queries leading to tool runtime errors and (2) user queries that lead to incorrect agent responses.<n>We show that ToolFuzz identifies 20x more erroneous inputs compared to the prompt-engineering approaches.
arXiv Detail & Related papers (2025-03-06T14:29:52Z) - Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE [5.824774194964031]
We implement a vulnerability detection and fix tool with professional software developers on real projects that they own.<n>DeepVulGuard scans code for vulnerabilities, suggests fixes, provides natural-language explanations for alerts and fixes, leveraging chat interfaces.<n>We find that although state-of-the-art AI-powered detection and fix tools show promise, they are not yet practical for real-world use due to a high rate of false positives and non-applicable fixes.
arXiv Detail & Related papers (2024-12-18T20:19:56Z) - RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - ACWRecommender: A Tool for Validating Actionable Warnings with Weak
Supervision [10.040337069728569]
Static analysis tools have gained popularity among developers for finding potential bugs, but their widespread adoption is hindered by the high false alarm rates.
Previous studies proposed the concept of actionable warnings, and apply machine-learning methods to distinguish actionable warnings from false alarms.
We propose a two-stage framework called ACWRecommender to automatically identify actionable warnings and recommend those with a high probability of being real bugs.
arXiv Detail & Related papers (2023-09-18T12:35:28Z) - Learning to Reduce False Positives in Analytic Bug Detectors [12.733531603080674]
We propose a Transformer-based learning approach to identify false positive bug warnings.
We demonstrate that our models can improve the precision of static analysis by 17.5%.
arXiv Detail & Related papers (2022-03-08T04:26:26Z) - Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas.
We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data.
Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data.
BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z) - Assessing Validity of Static Analysis Warnings using Ensemble Learning [4.05739885420409]
Static Analysis (SA) tools are used to identify potential weaknesses in code and fix them in advance, while the code is being developed.
These rules-based static analysis tools generally report a lot of false warnings along with the actual ones.
We propose a Machine Learning (ML)-based learning process that uses source codes, historic commit data, and classifier-ensembles to prioritize the True warnings.
arXiv Detail & Related papers (2021-04-21T19:39:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.