Related papers: Large Language Models Based Automatic Synthesis of Software Specifications

Large Language Models Based Automatic Synthesis of Software Specifications

URL: http://arxiv.org/abs/2304.09181v1
Date: Tue, 18 Apr 2023 01:22:44 GMT
Title: Large Language Models Based Automatic Synthesis of Software Specifications
Authors: Shantanu Mandal, Adhrik Chethan, Vahid Janfaza, S M Farabi Mahmud, Todd A Anderson, Javier Turek, Jesmin Jahan Tithi, Abdullah Muzahid
Abstract summary: SpecSyn is a framework that automatically synthesizes software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem.
Score: 3.081650600579376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences.

Related papers

NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer. We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z)
Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification [15.686651364655958]
AutoSpec is an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It can be successfully applied to verify the programs in a real-world X509-parser project.
arXiv Detail & Related papers (2024-03-31T18:15:49Z)
SpecGen: Automated Generation of Formal Program Specifications via Large Language Models [20.36964281778921]
SpecGen is a novel technique for formal program specification generation based on Large Language Models. We evaluate SpecGen on two datasets, including the SV-COMP 279 benchmark and a manually constructed dataset.
arXiv Detail & Related papers (2024-01-16T20:13:50Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates. We present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z)
An xAI Approach for Data-to-Text Processing with ASP [39.58317527488534]
This paper presents a framework that is compliant with xAI requirements. The text description is hierarchically organized, in a top-down structure where text is enriched with further details. The generation of natural language descriptions' structure is also managed by logic rules.
arXiv Detail & Related papers (2023-08-30T09:09:09Z)
How Effective are Large Language Models in Generating Software Specifications? [14.170320751508502]
Large Language Models (LLMs) have been successfully applied to numerous Software Engineering (SE) tasks. We conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation.
arXiv Detail & Related papers (2023-06-06T00:28:39Z)
Learning-Based Automatic Synthesis of Software Code and Configuration [0.951828574518325]
Large scale automatic software generation and configuration is a very complex and challenging task. In first task, we propose to synthesize software automatically with input output specifications. For the second task, we propose to synthesize configurations of large scale software from different input files.
arXiv Detail & Related papers (2023-05-25T01:41:30Z)
A Conversational Paradigm for Program Synthesis [110.94409515865867]
We propose a conversational program synthesis approach via large language models. We train a family of large language models, called CodeGen, on natural language and programming language data. Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
arXiv Detail & Related papers (2022-03-25T06:55:15Z)
On the validity of pre-trained transformers for natural language processing in the software engineering domain [78.32146765053318]
We compare BERT transformer models trained with software engineering data with transformers based on general domain data. Our results show that for tasks that require understanding of the software engineering context, pre-training with software engineering data is valuable.
arXiv Detail & Related papers (2021-09-10T08:46:31Z)
Multi-modal Program Inference: a Marriage of Pre-trainedLanguage Models and Component-based Synthesis [15.427687814482724]
Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. We use our combination approach to instantiate multi-modal synthesis systems for two programming domains.
arXiv Detail & Related papers (2021-09-03T16:12:04Z)
Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.