Large Language Models Based Automatic Synthesis of Software
Specifications
- URL: http://arxiv.org/abs/2304.09181v1
- Date: Tue, 18 Apr 2023 01:22:44 GMT
- Title: Large Language Models Based Automatic Synthesis of Software
Specifications
- Authors: Shantanu Mandal, Adhrik Chethan, Vahid Janfaza, S M Farabi Mahmud,
Todd A Anderson, Javier Turek, Jesmin Jahan Tithi, Abdullah Muzahid
- Abstract summary: SpecSyn is a framework that automatically synthesizes software specifications from natural language sources.
Our approach formulates software specification synthesis as a sequence-to-sequence learning problem.
- Score: 3.081650600579376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software configurations play a crucial role in determining the behavior of
software systems. In order to ensure safe and error-free operation, it is
necessary to identify the correct configuration, along with their valid bounds
and rules, which are commonly referred to as software specifications. As
software systems grow in complexity and scale, the number of configurations and
associated specifications required to ensure the correct operation can become
large and prohibitively difficult to manipulate manually. Due to the fast pace
of software development, it is often the case that correct software
specifications are not thoroughly checked or validated within the software
itself. Rather, they are frequently discussed and documented in a variety of
external sources, including software manuals, code comments, and online
discussion forums. Therefore, it is hard for the system administrator to know
the correct specifications of configurations due to the lack of clarity,
organization, and a centralized unified source to look at. To address this
challenge, we propose SpecSyn a framework that leverages a state-of-the-art
large language model to automatically synthesize software specifications from
natural language sources. Our approach formulates software specification
synthesis as a sequence-to-sequence learning problem and investigates the
extraction of specifications from large contextual texts. This is the first
work that uses a large language model for end-to-end specification synthesis
from natural language texts. Empirical results demonstrate that our system
outperforms prior the state-of-the-art specification synthesis tool by 21% in
terms of F1 score and can find specifications from single as well as multiple
sentences.
Related papers
- NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer.
We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z) - Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification [15.686651364655958]
AutoSpec is an automated approach to synthesize specifications for automated program verification.
It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof.
It can be successfully applied to verify the programs in a real-world X509-parser project.
arXiv Detail & Related papers (2024-03-31T18:15:49Z) - SpecGen: Automated Generation of Formal Program Specifications via Large Language Models [20.36964281778921]
SpecGen is a novel technique for formal program specification generation based on Large Language Models.
We evaluate SpecGen on two datasets, including the SV-COMP 279 benchmark and a manually constructed dataset.
arXiv Detail & Related papers (2024-01-16T20:13:50Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates.
We present an automated approach to formally synthesise controllers and certificates.
Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z) - An xAI Approach for Data-to-Text Processing with ASP [39.58317527488534]
This paper presents a framework that is compliant with xAI requirements.
The text description is hierarchically organized, in a top-down structure where text is enriched with further details.
The generation of natural language descriptions' structure is also managed by logic rules.
arXiv Detail & Related papers (2023-08-30T09:09:09Z) - Learning-Based Automatic Synthesis of Software Code and Configuration [0.951828574518325]
Large scale automatic software generation and configuration is a very complex and challenging task.
In first task, we propose to synthesize software automatically with input output specifications.
For the second task, we propose to synthesize configurations of large scale software from different input files.
arXiv Detail & Related papers (2023-05-25T01:41:30Z) - A Conversational Paradigm for Program Synthesis [110.94409515865867]
We propose a conversational program synthesis approach via large language models.
We train a family of large language models, called CodeGen, on natural language and programming language data.
Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
arXiv Detail & Related papers (2022-03-25T06:55:15Z) - On the validity of pre-trained transformers for natural language
processing in the software engineering domain [78.32146765053318]
We compare BERT transformer models trained with software engineering data with transformers based on general domain data.
Our results show that for tasks that require understanding of the software engineering context, pre-training with software engineering data is valuable.
arXiv Detail & Related papers (2021-09-10T08:46:31Z) - Multi-modal Program Inference: a Marriage of Pre-trainedLanguage Models
and Component-based Synthesis [15.427687814482724]
Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms.
Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description.
We use our combination approach to instantiate multi-modal synthesis systems for two programming domains.
arXiv Detail & Related papers (2021-09-03T16:12:04Z) - Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis.
When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.