Impact of Large Language Models on Generating Software Specifications
- URL: http://arxiv.org/abs/2306.03324v2
- Date: Mon, 2 Oct 2023 19:34:23 GMT
- Title: Impact of Large Language Models on Generating Software Specifications
- Authors: Danning Xie, Byungwoo Yoo, Nan Jiang, Mijung Kim, Lin Tan, Xiangyu
Zhang, Judy S. Lee
- Abstract summary: Large Language Models (LLMs) have been successfully applied to numerous software engineering tasks.
We evaluate the capabilities of LLMs for generating software specifications from software comments or documentation.
- Score: 14.88090169737112
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Software specifications are essential for ensuring the reliability of
software systems. Existing specification extraction approaches, however, suffer
from limited generalizability and require manual efforts. The recent emergence
of Large Language Models (LLMs), which have been successfully applied to
numerous software engineering tasks, offers a promising avenue for automating
this process. In this paper, we conduct the first empirical study to evaluate
the capabilities of LLMs for generating software specifications from software
comments or documentation. We evaluate LLMs' performance with Few Shot Learning
(FSL), enabling LLMs to generalize from a small number of examples, as well as
different prompt construction strategies, and compare the performance of LLMs
with traditional approaches. Additionally, we conduct a comparative diagnosis
of the failure cases from both LLMs and traditional methods, identifying their
unique strengths and weaknesses. Lastly, we conduct extensive experiments on 15
state of the art LLMs, evaluating their performance and cost effectiveness for
generating software specifications.
Our results show that with FSL, LLMs outperform traditional methods (by
5.6%), and more sophisticated prompt construction strategies can further
enlarge this performance gap (up to 5.1 to 10.0%). Yet, LLMs suffer from their
unique challenges, such as ineffective prompts and the lack of domain
knowledge, which together account for 53 to 60% of LLM unique failures. The
strong performance of open source models (e.g., StarCoder) makes closed source
models (e.g., GPT 3 Davinci) less desirable due to size and cost. Our study
offers valuable insights for future research to improve specification
generation.
Related papers
- zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning.
We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Evaluating Language Models for Generating and Judging Programming Feedback [4.743413681603463]
Large language models (LLMs) have transformed research and practice in a wide range of domains.
We evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments.
arXiv Detail & Related papers (2024-07-05T21:44:11Z) - On the Evaluation of Large Language Models in Unit Test Generation [16.447000441006814]
Unit testing is an essential activity in software development for verifying the correctness of software components.
The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation.
arXiv Detail & Related papers (2024-06-26T08:57:03Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently.
Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting.
We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z) - Multitask-based Evaluation of Open-Source LLM on Software Vulnerability [2.7692028382314815]
This paper proposes a pipeline for quantitatively evaluating interactive Large Language Models (LLMs) using publicly available datasets.
We carry out an extensive technical evaluation of LLMs using Big-Vul covering four different common software vulnerability tasks.
We find that the existing state-of-the-art approaches and pre-trained Language Models (LMs) are generally superior to LLMs in software vulnerability detection.
arXiv Detail & Related papers (2024-04-02T15:52:05Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Survey on Factuality in Large Language Models: Knowledge, Retrieval and
Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs)
As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.