Assessing Validity of Static Analysis Warnings using Ensemble Learning
- URL: http://arxiv.org/abs/2104.11593v1
- Date: Wed, 21 Apr 2021 19:39:20 GMT
- Title: Assessing Validity of Static Analysis Warnings using Ensemble Learning
- Authors: Anshul Tanwar, Hariharan Manikandan, Krishna Sundaresan, Prasanna
Ganesan, Sathish Kumar Chandrasekaran, Sriram Ravi
- Abstract summary: Static Analysis (SA) tools are used to identify potential weaknesses in code and fix them in advance, while the code is being developed.
These rules-based static analysis tools generally report a lot of false warnings along with the actual ones.
We propose a Machine Learning (ML)-based learning process that uses source codes, historic commit data, and classifier-ensembles to prioritize the True warnings.
- Score: 4.05739885420409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Static Analysis (SA) tools are used to identify potential weaknesses in code
and fix them in advance, while the code is being developed. In legacy codebases
with high complexity, these rules-based static analysis tools generally report
a lot of false warnings along with the actual ones. Though the SA tools uncover
many hidden bugs, they are lost in the volume of fake warnings reported. The
developers expend large hours of time and effort in identifying the true
warnings. Other than impacting the developer productivity, true bugs are also
missed out due to this challenge. To address this problem, we propose a Machine
Learning (ML)-based learning process that uses source codes, historic commit
data, and classifier-ensembles to prioritize the True warnings from the given
list of warnings. This tool is integrated into the development workflow to
filter out the false warnings and prioritize actual bugs. We evaluated our
approach on the networking C codes, from a large data pool of static analysis
warnings reported by the tools. Time-to-time these warnings are addressed by
the developers, labelling them as authentic bugs or fake alerts. The ML model
is trained with full supervision over the code features. Our results confirm
that applying deep learning over the traditional static analysis reports is an
assuring approach for drastically reducing the false positive rates.
Related papers
- Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools [18.927121513404924]
Automated Static Analysis Tools (ASATs) have evolved over time to assist in detecting bugs.
Previous research efforts have explored learning-based methods to validate the reported warnings.
We propose FineWAVE, a learning-based approach that verifies bug-sensitive warnings at a fine-grained granularity.
arXiv Detail & Related papers (2024-03-24T06:21:35Z) - Quieting the Static: A Study of Static Analysis Alert Suppressions [7.324969824727792]
We examine 1,425 open-source Java-based projects that utilize Findbugs or Spotbugs for warning-suppressing configurations and source code annotations.
We find that although most warnings are suppressed, only a small portion of them get frequently suppressed.
Findings underscore the need for better communication and education related to the use of static analysis tools.
arXiv Detail & Related papers (2023-11-13T17:16:25Z) - ACWRecommender: A Tool for Validating Actionable Warnings with Weak
Supervision [10.040337069728569]
Static analysis tools have gained popularity among developers for finding potential bugs, but their widespread adoption is hindered by the high false alarm rates.
Previous studies proposed the concept of actionable warnings, and apply machine-learning methods to distinguish actionable warnings from false alarms.
We propose a two-stage framework called ACWRecommender to automatically identify actionable warnings and recommend those with a high probability of being real bugs.
arXiv Detail & Related papers (2023-09-18T12:35:28Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Tracking the Evolution of Static Code Warnings: the State-of-the-Art and
a Better Approach [18.350023994564904]
Static bug detection tools help developers detect problems in the code, including bad programming practices and potential defects.
Recent efforts to integrate static bug detectors in modern software development, such as in code review and continuous integration, are shown to better motivate developers to fix the reported warnings on the fly.
arXiv Detail & Related papers (2022-10-06T03:02:32Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Learning to Reduce False Positives in Analytic Bug Detectors [12.733531603080674]
We propose a Transformer-based learning approach to identify false positive bug warnings.
We demonstrate that our models can improve the precision of static analysis by 17.5%.
arXiv Detail & Related papers (2022-03-08T04:26:26Z) - Sample-Efficient Safety Assurances using Conformal Prediction [57.92013073974406]
Early warning systems can provide alerts when an unsafe situation is imminent.
To reliably improve safety, these warning systems should have a provable false negative rate.
We present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics.
arXiv Detail & Related papers (2021-09-28T23:00:30Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.