Related papers: Assessing Validity of Static Analysis Warnings using Ensemble Learning

Assessing Validity of Static Analysis Warnings using Ensemble Learning

URL: http://arxiv.org/abs/2104.11593v1
Date: Wed, 21 Apr 2021 19:39:20 GMT
Title: Assessing Validity of Static Analysis Warnings using Ensemble Learning
Authors: Anshul Tanwar, Hariharan Manikandan, Krishna Sundaresan, Prasanna Ganesan, Sathish Kumar Chandrasekaran, Sriram Ravi
Abstract summary: Static Analysis (SA) tools are used to identify potential weaknesses in code and fix them in advance, while the code is being developed. These rules-based static analysis tools generally report a lot of false warnings along with the actual ones. We propose a Machine Learning (ML)-based learning process that uses source codes, historic commit data, and classifier-ensembles to prioritize the True warnings.
Score: 4.05739885420409
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Static Analysis (SA) tools are used to identify potential weaknesses in code and fix them in advance, while the code is being developed. In legacy codebases with high complexity, these rules-based static analysis tools generally report a lot of false warnings along with the actual ones. Though the SA tools uncover many hidden bugs, they are lost in the volume of fake warnings reported. The developers expend large hours of time and effort in identifying the true warnings. Other than impacting the developer productivity, true bugs are also missed out due to this challenge. To address this problem, we propose a Machine Learning (ML)-based learning process that uses source codes, historic commit data, and classifier-ensembles to prioritize the True warnings from the given list of warnings. This tool is integrated into the development workflow to filter out the false warnings and prioritize actual bugs. We evaluated our approach on the networking C codes, from a large data pool of static analysis warnings reported by the tools. Time-to-time these warnings are addressed by the developers, labelling them as authentic bugs or fake alerts. The ML model is trained with full supervision over the code features. Our results confirm that applying deep learning over the traditional static analysis reports is an assuring approach for drastically reducing the false positive rates.

Related papers

Enhancing Code Quality with Generative AI: Boosting Developer Warning Compliance [0.17495213911983415]
In some cases, warnings may be indicative of larger issues, but programmers may not understand how a seemingly unimportant warning can grow into a vulnerability.<n>Because these messages tend to be long and confusing, programmers tend to ignore them if they do not cause readily identifiable issues.<n>Large language models can simplify these warnings, explain the gravity of important warnings, and suggest potential fixes to increase developer compliance with fixing warnings.
arXiv Detail & Related papers (2025-05-16T20:26:05Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail. The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure. Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z)
FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools [18.927121513404924]
Automated Static Analysis Tools (ASATs) have evolved over time to assist in detecting bugs. Previous research efforts have explored learning-based methods to validate the reported warnings. We propose FineWAVE, a learning-based approach that verifies bug-sensitive warnings at a fine-grained granularity.
arXiv Detail & Related papers (2024-03-24T06:21:35Z)
Quieting the Static: A Study of Static Analysis Alert Suppressions [7.324969824727792]
We examine 1,425 open-source Java-based projects that utilize Findbugs or Spotbugs for warning-suppressing configurations and source code annotations. We find that although most warnings are suppressed, only a small portion of them get frequently suppressed. Findings underscore the need for better communication and education related to the use of static analysis tools.
arXiv Detail & Related papers (2023-11-13T17:16:25Z)
ACWRecommender: A Tool for Validating Actionable Warnings with Weak Supervision [10.040337069728569]
Static analysis tools have gained popularity among developers for finding potential bugs, but their widespread adoption is hindered by the high false alarm rates. Previous studies proposed the concept of actionable warnings, and apply machine-learning methods to distinguish actionable warnings from false alarms. We propose a two-stage framework called ACWRecommender to automatically identify actionable warnings and recommend those with a high probability of being real bugs.
arXiv Detail & Related papers (2023-09-18T12:35:28Z)
Automated Static Warning Identification via Path-based Semantic Representation [37.70518599085676]
This paper employs deep neural networks' powerful feature extraction and representation abilities to generate code semantics from control flow graph paths for warning identification. We fine-tune the pre-trained language model to encode the path sequences and capture the semantic representations for model building.
arXiv Detail & Related papers (2023-06-27T15:46:45Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Tracking the Evolution of Static Code Warnings: the State-of-the-Art and a Better Approach [18.350023994564904]
Static bug detection tools help developers detect problems in the code, including bad programming practices and potential defects. Recent efforts to integrate static bug detectors in modern software development, such as in code review and continuous integration, are shown to better motivate developers to fix the reported warnings on the fly.
arXiv Detail & Related papers (2022-10-06T03:02:32Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
Learning to Reduce False Positives in Analytic Bug Detectors [12.733531603080674]
We propose a Transformer-based learning approach to identify false positive bug warnings. We demonstrate that our models can improve the precision of static analysis by 17.5%.
arXiv Detail & Related papers (2022-03-08T04:26:26Z)
Sample-Efficient Safety Assurances using Conformal Prediction [57.92013073974406]
Early warning systems can provide alerts when an unsafe situation is imminent. To reliably improve safety, these warning systems should have a provable false negative rate. We present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics.
arXiv Detail & Related papers (2021-09-28T23:00:30Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.