Related papers: Checkification: A Practical Approach for Testing Static Analysis Truths

Checkification: A Practical Approach for Testing Static Analysis Truths

URL: http://arxiv.org/abs/2501.12093v1
Date: Tue, 21 Jan 2025 12:38:04 GMT
Title: Checkification: A Practical Approach for Testing Static Analysis Truths
Authors: Daniela Ferreiro, Ignacio Casso, Jose F. Morales, Pedro López-García, Manuel V. Hermenegildo,
Abstract summary: We propose a method for testing abstract interpretation-based static analyzers.<n>The main advantage of our approach lies in its simplicity, which stems directly from framing it within the Ciao assertion-based validation framework.<n>We have applied our approach to the CiaoPP static analyzer, resulting in the identification of many bugs with reasonable overhead.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Static analysis is an essential component of many modern software development tools. Unfortunately, the ever-increasing complexity of static analyzers makes their coding error-prone. Even analysis tools based on rigorous mathematical techniques, such as abstract interpretation, are not immune to bugs. Ensuring the correctness and reliability of software analyzers is critical if they are to be inserted in production compilers and development environments. While compiler validation has seen notable success, formal validation of static analysis tools remains relatively unexplored. In this paper, we propose a method for testing abstract interpretation-based static analyzers. Broadly, it consists in checking, over a suite of benchmarks, that the properties inferred statically are satisfied dynamically. The main advantage of our approach lies in its simplicity, which stems directly from framing it within the Ciao assertion-based validation framework, and its blended static/dynamic assertion checking approach. We demonstrate that in this setting, the analysis can be tested with little effort by combining the following components already present in the framework: 1) the static analyzer, which outputs its results as the original program source with assertions interspersed; 2) the assertion run-time checking mechanism, which instruments a program to ensure that no assertion is violated at run time; 3) the random test case generator, which generates random test cases satisfying the properties present in assertion preconditions; and 4) the unit-test framework, which executes those test cases. We have applied our approach to the CiaoPP static analyzer, resulting in the identification of many bugs with reasonable overhead. Most of these bugs have been either fixed or confirmed, helping us detect a range of errors not only related to analysis soundness but also within other aspects of the framework.

Related papers

Understanding and Characterizing Mock Assertions in Unit Tests [12.96550571237691]
Despite their significance, mock assertions are rarely considered by automated test generation techniques. Our analysis of 4,652 test cases from 11 popular Java projects reveals that mock assertions are mostly applied to validating specific kinds of method calls. We find that mock assertions complement traditional test assertions by ensuring the desired side effects have been produced.
arXiv Detail & Related papers (2025-03-25T02:35:05Z)
Examining False Positives under Inference Scaling for Mathematical Reasoning [59.19191774050967]
This paper systematically examines the prevalence of false positive solutions in mathematical problem solving for language models. We explore how false positives influence the inference time scaling behavior of language models.
arXiv Detail & Related papers (2025-02-10T07:49:35Z)
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs [78.84060166851805]
StructTest is a novel benchmark that evaluates large language models on their ability to produce structured outputs.<n>We demonstrate that StructTest serves as a good proxy for general reasoning abilities.
arXiv Detail & Related papers (2024-12-23T22:08:40Z)
Scaling Symbolic Execution to Large Software Systems [0.0]
Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software. We focus on an error finding framework called the Clang Static Analyzer, and the infrastructure built around it named CodeChecker.
arXiv Detail & Related papers (2024-08-04T02:54:58Z)
SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines [12.940071285118451]
academia focuses on combining static and dynamic analysis within a single or ensemble of models.<n>In this paper, we investigate the properties of malware detectors built with multiple and different types of analysis.<n>As far as we know, we are the first to investigate the properties of sequential malware detectors, shedding light on their behavior in real production environment.
arXiv Detail & Related papers (2024-05-23T12:06:10Z)
Customizing Static Analysis using Codesearch [1.7205106391379021]
A commonly used language to describe a range of static analysis applications is Datalog. We aim to make building custom static analysis tools much easier for developers, while at the same time providing a familiar framework for application security and static analysis experts. Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime.
arXiv Detail & Related papers (2024-04-19T09:50:02Z)
Supporting Error Chains in Static Analysis for Precise Evaluation Results and Enhanced Usability [2.8557828838739527]
Static analyses tend to report where a vulnerability manifests rather than the fix location. This can cause presumed false positives or imprecise results. We designed an adaption of an existing static analysis algorithm that can distinguish between a manifestation and fix location.
arXiv Detail & Related papers (2024-03-12T16:46:29Z)
Understanding and Detecting Annotation-Induced Faults of Static Analyzers [4.824956210843882]
This paper presents the first comprehensive study of annotation-induced faults (AIF) We analyzed 246 issues in six open-source and popular static analyzers (i.e., PMD, SpotBugs, CheckStyle, Infer, SonarQube, and Soot)
arXiv Detail & Related papers (2024-02-22T08:09:01Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation. In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions [93.62888099134028]
We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable. This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets? We give both theoretical explanations and empirical evidence regarding the source of the instability.
arXiv Detail & Related papers (2020-04-28T15:41:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.