Supporting Error Chains in Static Analysis for Precise Evaluation
Results and Enhanced Usability
- URL: http://arxiv.org/abs/2403.07808v1
- Date: Tue, 12 Mar 2024 16:46:29 GMT
- Title: Supporting Error Chains in Static Analysis for Precise Evaluation
Results and Enhanced Usability
- Authors: Anna-Katharina Wickert and Michael Schlichtig and Marvin Vogel and
Lukas Winter and Mira Mezini and Eric Bodden
- Abstract summary: Static analyses tend to report where a vulnerability manifests rather than the fix location.
This can cause presumed false positives or imprecise results.
We designed an adaption of an existing static analysis algorithm that can distinguish between a manifestation and fix location.
- Score: 2.8557828838739527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: Static analyses are well-established to aid in understanding bugs or
vulnerabilities during the development process or in large-scale studies. A low
false-positive rate is essential for the adaption in practice and for precise
results of empirical studies. Unfortunately, static analyses tend to report
where a vulnerability manifests rather than the fix location. This can cause
presumed false positives or imprecise results. Method: To address this problem,
we designed an adaption of an existing static analysis algorithm that can
distinguish between a manifestation and fix location, and reports error chains.
An error chain represents at least two interconnected errors that occur
successively, thus building the connection between the fix and manifestation
location. We used our tool CogniCryptSUBS for a case study on 471 GitHub
repositories, a performance benchmark to compare different analysis
configurations, and conducted an expert interview. Result: We found that 50 %
of the projects with a report had at least one error chain. Our runtime
benchmark demonstrated that our improvement caused only a minimal runtime
overhead of less than 4 %. The results of our expert interview indicate that
with our adapted version participants require fewer executions of the analysis.
Conclusion: Our results indicate that error chains occur frequently in
real-world projects, and ignoring them can lead to imprecise evaluation
results. The runtime benchmark indicates that our tool is a feasible and
efficient solution for detecting error chains in real-world projects. Further,
our results gave a hint that the usability of static analyses may benefit from
supporting error chains.
Related papers
- REDO: Execution-Free Runtime Error Detection for COding Agents [3.9903610503301072]
Execution-free Error Detection for COding Agents (REDO) is a method that integrates runtime errors with static analysis tools.
We demonstrate that REDO outperforms current state-of-the-art methods by achieving a 11.0% higher accuracy and a 9.1% higher weighted F1 score.
arXiv Detail & Related papers (2024-10-10T18:06:29Z) - Smart Contract Vulnerability Detection based on Static Analysis and Multi-Objective Search [3.297959314391795]
This paper introduces a method for detecting vulnerabilities in smart contracts using static analysis and a multi-objective optimization algorithm.
We focus on four types of vulnerabilities: reentrancy, call stack overflow, integer overflow, and timestamp dependencies.
We validate our approach using an open-source dataset collected from Etherscan, containing 6,693 smart contracts.
arXiv Detail & Related papers (2024-09-30T23:28:17Z) - Understanding and Detecting Annotation-Induced Faults of Static
Analyzers [4.824956210843882]
This paper presents the first comprehensive study of annotation-induced faults (AIF)
We analyzed 246 issues in six open-source and popular static analyzers (i.e., PMD, SpotBugs, CheckStyle, Infer, SonarQube, and Soot)
arXiv Detail & Related papers (2024-02-22T08:09:01Z) - E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification [7.745665775992235]
Large Language Models (LLMs) offer new capabilities for software engineering tasks.
LLMs simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort.
E&V includes a verification process for pseudo-code execution without needing an external oracle.
arXiv Detail & Related papers (2023-12-13T19:31:00Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Learning to Reduce False Positives in Analytic Bug Detectors [12.733531603080674]
We propose a Transformer-based learning approach to identify false positive bug warnings.
We demonstrate that our models can improve the precision of static analysis by 17.5%.
arXiv Detail & Related papers (2022-03-08T04:26:26Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Reducing Confusion in Active Learning for Part-Of-Speech Tagging [100.08742107682264]
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost.
We study the problem of selecting instances which maximally reduce the confusion between particular pairs of output tags.
Our proposed AL strategy outperforms other AL strategies by a significant margin.
arXiv Detail & Related papers (2020-11-02T06:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.