Using LLMs in Software Requirements Specifications: An Empirical Evaluation
- URL: http://arxiv.org/abs/2404.17842v1
- Date: Sat, 27 Apr 2024 09:37:00 GMT
- Title: Using LLMs in Software Requirements Specifications: An Empirical Evaluation
- Authors: Madhava Krishna, Bhagesh Gaur, Arsh Verma, Pankaj Jalote,
- Abstract summary: We assess the performance of GPT-4 and CodeLlama in drafting an Software Requirements Specification.
Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS.
We conclude that the LLMs can be gainfully used by software engineers to increase productivity.
- Score: 0.2812395851874055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The creation of a Software Requirements Specification (SRS) document is important for any software development project. Given the recent prowess of Large Language Models (LLMs) in answering natural language queries and generating sophisticated textual outputs, our study explores their capability to produce accurate, coherent, and structured drafts of these documents to accelerate the software development lifecycle. We assess the performance of GPT-4 and CodeLlama in drafting an SRS for a university club management system and compare it against human benchmarks using eight distinct criteria. Our results suggest that LLMs can match the output quality of an entry-level software engineer to generate an SRS, delivering complete and consistent drafts. We also evaluate the capabilities of LLMs to identify and rectify problems in a given requirements document. Our experiments indicate that GPT-4 is capable of identifying issues and giving constructive feedback for rectifying them, while CodeLlama's results for validation were not as encouraging. We repeated the generation exercise for four distinct use cases to study the time saved by employing LLMs for SRS generation. The experiment demonstrates that LLMs may facilitate a significant reduction in development time for entry-level software engineers. Hence, we conclude that the LLMs can be gainfully used by software engineers to increase productivity by saving time and effort in generating, validating and rectifying software requirements.
Related papers
- Studying and Benchmarking Large Language Models For Log Level Suggestion [49.176736212364496]
Large Language Models (LLMs) have become a focal point of research across various domains.
This paper investigates the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion.
arXiv Detail & Related papers (2024-10-11T03:52:17Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Requirements are All You Need: From Requirements to Code with LLMs [0.0]
Large language models (LLMs) can be applied to software engineering tasks.
This paper introduces a tailored LLM for automating the generation of code snippets from well-structured requirements documents.
We demonstrate the LLM's proficiency in comprehending intricate user requirements and producing robust design and code solutions.
arXiv Detail & Related papers (2024-06-14T14:57:35Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming [12.355284125578342]
Large Language Models (LLMs) have become a focal point in modern software development.
LLMs offer the potential to significantly augment developer productivity by serving as intelligent, chat-driven programming assistants.
However, each system requires the LLM to be honed to its set of workspaces to ensure the best performance.
arXiv Detail & Related papers (2024-02-22T03:51:34Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Testing LLMs on Code Generation with Varying Levels of Prompt
Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing.
The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Towards an Understanding of Large Language Models in Software Engineering Tasks [29.30433406449331]
Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in text generation and reasoning tasks.
The evaluation and optimization of LLMs in software engineering tasks, such as code generation, have become a research focus.
This paper comprehensively investigate and collate the research and products combining LLMs with software engineering.
arXiv Detail & Related papers (2023-08-22T12:37:29Z) - Software Testing with Large Language Models: Survey, Landscape, and
Vision [32.34617250991638]
Pre-trained large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence.
This paper provides a comprehensive review of the utilization of LLMs in software testing.
arXiv Detail & Related papers (2023-07-14T08:26:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.