SAT-Based Extraction of Behavioural Models for Java Libraries with
Collections
- URL: http://arxiv.org/abs/2205.15270v2
- Date: Wed, 8 Nov 2023 11:58:20 GMT
- Title: SAT-Based Extraction of Behavioural Models for Java Libraries with
Collections
- Authors: Larisa Safina and Simon Bliudze
- Abstract summary: Behavioural models are a valuable tool for software verification, testing, monitoring, publishing etc.
They are rarely provided by the software developers and have to be extracted either from the source or from the compiled code.
Most of these approaches rely on the analysis of the compiled bytecode.
We are looking to extract behavioural models in the form of Finite State Machines (FSMs) from the Java source code to ensure that the obtained FSMs can be easily understood by the software developers.
- Score: 0.087024326813104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Behavioural models are a valuable tool for software verification, testing,
monitoring, publishing etc. However, they are rarely provided by the software
developers and have to be extracted either from the source or from the compiled
code. In the context of Java programs, a number of approaches exist for
building behavioural models. Most of these approaches rely on the analysis of
the compiled bytecode. Instead, we are looking to extract behavioural models in
the form of Finite State Machines (FSMs) from the Java source code to ensure
that the obtained FSMs can be easily understood by the software developers and,
if necessary, updated or integrated into the original source code, e.g. in the
form of annotations. Modern software systems are huge, rely on external
libraries and interact with their environment. Hence, extracting useful
behavioural models requires abstraction. In this paper, we present an initial
approach to this problem by focusing on the extraction of FSMs modelling
library APIs. We focus on the analysis of Java code involving the use of
collections. To this end, we encode the operational semantics of collection
operations using patterns of Boolean predicates. These patterns are
instantiated based on the analysis of the source code of API implementation
methods to form an encoding of the possible FSM transitions. A SAT solver is
then used to determine the enabledness conditions (guards) of these
transitions.
Related papers
- Runtime Verification on Abstract Finite State Models [0.0]
We show how to extract finite state models from a run of a multi-threaded Java program and carry out runtime verification of correctness properties.
The main contribution of this paper is in showing how runtime verification can be made efficient through online property checking on property-preserving abstract models.
arXiv Detail & Related papers (2024-06-18T15:32:31Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Guiding Language Models of Code with Global Context using Monitors [17.05416012014561]
Language models of code (LMs) work well when the surrounding code provides sufficient context.
LMs suffer from limited awareness of such global context and end up hallucinating.
We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding.
arXiv Detail & Related papers (2023-06-19T08:13:50Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - When Language Model Meets Private Library [25.610036042971043]
In practice, it is common for programmers to write code using private libraries.
This is a challenge for language models since they have never seen private APIs during training.
We propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs.
arXiv Detail & Related papers (2022-10-31T11:42:06Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - AstBERT: Enabling Language Model for Code Understanding with Abstract
Syntax Tree [3.1087379479634927]
We propose the AstBERT model, a pre-trained language model aiming to better understand the programming language (PL) using the abstract syntax tree (AST)
Specifically, we collect a colossal amount of source codes (both java and python) from GitHub, in which information of the source codes can be interpreted and integrated.
Experiment results show that our AstBERT model achieves state-of-the-art performance on both downstream tasks.
arXiv Detail & Related papers (2022-01-20T03:27:26Z) - Retrieve and Refine: Exemplar-based Neural Comment Generation [27.90756259321855]
Comments of similar code snippets are helpful for comment generation.
We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input.
We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
arXiv Detail & Related papers (2020-10-09T09:33:10Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.