Related papers: Supporting Error Chains in Static Analysis for Precise Evaluation Results and Enhanced Usability

Supporting Error Chains in Static Analysis for Precise Evaluation Results and Enhanced Usability

URL: http://arxiv.org/abs/2403.07808v1
Date: Tue, 12 Mar 2024 16:46:29 GMT
Title: Supporting Error Chains in Static Analysis for Precise Evaluation Results and Enhanced Usability
Authors: Anna-Katharina Wickert and Michael Schlichtig and Marvin Vogel and Lukas Winter and Mira Mezini and Eric Bodden
Abstract summary: Static analyses tend to report where a vulnerability manifests rather than the fix location. This can cause presumed false positives or imprecise results. We designed an adaption of an existing static analysis algorithm that can distinguish between a manifestation and fix location.
Score: 2.8557828838739527
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context: Static analyses are well-established to aid in understanding bugs or vulnerabilities during the development process or in large-scale studies. A low false-positive rate is essential for the adaption in practice and for precise results of empirical studies. Unfortunately, static analyses tend to report where a vulnerability manifests rather than the fix location. This can cause presumed false positives or imprecise results. Method: To address this problem, we designed an adaption of an existing static analysis algorithm that can distinguish between a manifestation and fix location, and reports error chains. An error chain represents at least two interconnected errors that occur successively, thus building the connection between the fix and manifestation location. We used our tool CogniCryptSUBS for a case study on 471 GitHub repositories, a performance benchmark to compare different analysis configurations, and conducted an expert interview. Result: We found that 50 % of the projects with a report had at least one error chain. Our runtime benchmark demonstrated that our improvement caused only a minimal runtime overhead of less than 4 %. The results of our expert interview indicate that with our adapted version participants require fewer executions of the analysis. Conclusion: Our results indicate that error chains occur frequently in real-world projects, and ignoring them can lead to imprecise evaluation results. The runtime benchmark indicates that our tool is a feasible and efficient solution for detecting error chains in real-world projects. Further, our results gave a hint that the usability of static analyses may benefit from supporting error chains.

Related papers

Minimizing False Positives in Static Bug Detection via LLM-Enhanced Path Feasibility Analysis [19.798348922632314]
Existing analyzers for bug detection in larges often suffer from high false positive rates.<n>This is primarily due to the limited capabilities of analyzers in path feasibility validation.<n>We propose an iterative path feasibility analysis framework LLM4PFA.
arXiv Detail & Related papers (2025-06-12T03:11:38Z)
LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs [24.821587679661974]
LASHED combines the two approaches (LLMs and Static Analysis) to overcome each other's limitations for hardware security bug detection. We find that 87.5% of instances flagged by our recommended scheme are plausible Common Weaknessions (CWEs) In-context learning and asking the model to 'think again' improves LASHED's precision.
arXiv Detail & Related papers (2025-04-30T16:15:53Z)
Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling [3.481985817302898]
We leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome.
arXiv Detail & Related papers (2025-04-22T15:18:14Z)
The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs [17.497629884237647]
BugLens is a post-refinement framework that significantly improves static analysis precision. It raises precision from 0.10 (raw) and 0.50 (semi-automated refinement) to 0.72, substantially reducing false positives. Our results suggest that a structured LLM-based workflow can meaningfully enhance the effectiveness of static analysis tools.
arXiv Detail & Related papers (2025-04-16T02:17:06Z)
Examining False Positives under Inference Scaling for Mathematical Reasoning [59.19191774050967]
This paper systematically examines the prevalence of false positive solutions in mathematical problem solving for language models. We explore how false positives influence the inference time scaling behavior of language models.
arXiv Detail & Related papers (2025-02-10T07:49:35Z)
Checkification: A Practical Approach for Testing Static Analysis Truths [0.0]
We propose a method for testing abstract interpretation-based static analyzers. The main advantage of our approach lies in its simplicity, which stems directly from framing it within the Ciao assertion-based validation framework. We have applied our approach to the CiaoPP static analyzer, resulting in the identification of many bugs with reasonable overhead.
arXiv Detail & Related papers (2025-01-21T12:38:04Z)
REDO: Execution-Free Runtime Error Detection for COding Agents [3.9903610503301072]
Execution-free Error Detection for COding Agents (REDO) is a method that integrates runtime errors with static analysis tools. We demonstrate that REDO outperforms current state-of-the-art methods by achieving a 11.0% higher accuracy and a 9.1% higher weighted F1 score.
arXiv Detail & Related papers (2024-10-10T18:06:29Z)
Smart Contract Vulnerability Detection based on Static Analysis and Multi-Objective Search [3.297959314391795]
This paper introduces a method for detecting vulnerabilities in smart contracts using static analysis and a multi-objective optimization algorithm. We focus on four types of vulnerabilities: reentrancy, call stack overflow, integer overflow, and timestamp dependencies. We validate our approach using an open-source dataset collected from Etherscan, containing 6,693 smart contracts.
arXiv Detail & Related papers (2024-09-30T23:28:17Z)
Understanding and Detecting Annotation-Induced Faults of Static Analyzers [4.824956210843882]
This paper presents the first comprehensive study of annotation-induced faults (AIF) We analyzed 246 issues in six open-source and popular static analyzers (i.e., PMD, SpotBugs, CheckStyle, Infer, SonarQube, and Soot)
arXiv Detail & Related papers (2024-02-22T08:09:01Z)
E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification [7.745665775992235]
Large Language Models (LLMs) offer new capabilities for software engineering tasks. LLMs simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort. E&V includes a verification process for pseudo-code execution without needing an external oracle.
arXiv Detail & Related papers (2023-12-13T19:31:00Z)
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z)
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z)
Learning to Reduce False Positives in Analytic Bug Detectors [12.733531603080674]
We propose a Transformer-based learning approach to identify false positive bug warnings. We demonstrate that our models can improve the precision of static analysis by 17.5%.
arXiv Detail & Related papers (2022-03-08T04:26:26Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
Reducing Confusion in Active Learning for Part-Of-Speech Tagging [100.08742107682264]
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. We study the problem of selecting instances which maximally reduce the confusion between particular pairs of output tags. Our proposed AL strategy outperforms other AL strategies by a significant margin.
arXiv Detail & Related papers (2020-11-02T06:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.