Customizing Static Analysis using Codesearch
- URL: http://arxiv.org/abs/2404.12747v1
- Date: Fri, 19 Apr 2024 09:50:02 GMT
- Title: Customizing Static Analysis using Codesearch
- Authors: Avi Hayoun, Veselin Raychev, Jack Hair,
- Abstract summary: A commonly used language to describe a range of static analysis applications is Datalog.
We aim to make building custom static analysis tools much easier for developers, while at the same time providing a familiar framework for application security and static analysis experts.
Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime.
- Score: 1.7205106391379021
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Static analysis is a growing application of software engineering, leading to a range of essential security tools, bug-finding tools, as well as software verification. Recent years show an increase of universal static analysis tools that validate a range of properties and allow customizing parts of the scanner to validate additional properties or "static analysis rules". A commonly used language to describe a range of static analysis applications is Datalog. Unfortunately, the language is still non-trivial to use, leading to analysis that is difficult to implement in a precise but performant way. In this work, we aim to make building custom static analysis tools much easier for developers, while at the same time, providing a familiar framework for application security and static analysis experts. Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime by the means of having low time complexity of its decision procedure.
Related papers
- Scaling Symbolic Execution to Large Software Systems [0.0]
Symbolic execution is a popular static analysis technique used both in program verification and in bug detection software.
We focus on an error finding framework called the Clang Static Analyzer, and the infrastructure built around it named CodeChecker.
arXiv Detail & Related papers (2024-08-04T02:54:58Z) - Easing Maintenance of Academic Static Analyzers [0.0]
Mopsa is a static analysis platform that aims at being sound.
This article documents the tools and techniques we have come up with to simplify the maintenance of Mopsa since 2017.
arXiv Detail & Related papers (2024-07-17T11:29:21Z) - Efficacy of static analysis tools for software defect detection on open-source projects [0.0]
The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison.
The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection.
arXiv Detail & Related papers (2024-05-20T19:05:32Z) - Integrating Static Code Analysis Toolchains [0.8246494848934447]
State of the art toolchains support features for either test execution and build automation or traceability between tests, requirements and design information.
Our approach combines all those features and extends traceability to the source code level, incorporating static code analysis.
arXiv Detail & Related papers (2024-03-09T18:59:50Z) - E&V: Prompting Large Language Models to Perform Static Analysis by
Pseudo-code Execution and Verification [7.745665775992235]
Large Language Models (LLMs) offer new capabilities for software engineering tasks.
LLMs simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort.
E&V includes a verification process for pseudo-code execution without needing an external oracle.
arXiv Detail & Related papers (2023-12-13T19:31:00Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Rissanen Data Analysis: Examining Dataset Characteristics via
Description Length [78.42578316883271]
We introduce a method to determine if a certain capability helps to achieve an accurate model of given data.
Since minimum program length is uncomputable, we estimate the labels' minimum description length (MDL) as a proxy.
We call the method Rissanen Data Analysis (RDA) after the father of MDL.
arXiv Detail & Related papers (2021-03-05T18:58:32Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.