System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT
- URL: http://arxiv.org/abs/2412.03693v1
- Date: Wed, 04 Dec 2024 20:12:27 GMT
- Title: System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT
- Authors: Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, Pankaj Jalote,
- Abstract summary: This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents.
About 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant.
- Score: 1.9282110216621835
- License:
- Abstract: System testing is essential in any software development project to ensure that the final products meet the requirements. Creating comprehensive test cases for system testing from requirements is often challenging and time-consuming. This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents. In this study, we collected the SRS documents of five software engineering projects containing functional and non-functional requirements, which were implemented, tested, and delivered by respective developer teams. For generating test case designs, we used ChatGPT-4o Turbo model. We employed prompt-chaining, starting with an initial context-setting prompt, followed by prompts to generate test cases for each use case. We assessed the quality of the generated test case designs through feedback from the same developer teams as mentioned above. Our experiments show that about 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant. Notably, 15 percent of the valid test cases were previously not considered by developers in their testing. We also tasked ChatGPT with identifying redundant test cases, which were subsequently validated by the respective developers to identify false positives and to uncover any redundant test cases that may have been missed by the developers themselves. This study highlights the potential of leveraging LLMs for test generation from the Requirements Specification document and also for assisting developers in quickly identifying and addressing redundancies, ultimately improving test suite quality and efficiency of the testing procedure.
Related papers
- CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - Historical Test-time Prompt Tuning for Vision Foundation Models [99.96912440427192]
HisTPT is a Historical Test-time Prompt Tuning technique that memorizes the useful knowledge of the learnt test samples.
HisTPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks.
arXiv Detail & Related papers (2024-10-27T06:03:15Z) - Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases.
We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z) - An approach for performance requirements verification and test
environments generation [1.359087929215203]
We conducted a systematic mapping study on model-based performance testing.
We studied natural language software requirements specifications in order to understand which and how performance requirements are typically specified.
Since none of the identified MBT techniques supported a major benefit of modeling, we developed the Performance Requirements verificatiOn and Test Environments generaTion approach.
arXiv Detail & Related papers (2024-02-29T19:59:26Z) - Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution.
TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults.
Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z) - Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process.
We present TaRGET, a novel approach leveraging pre-trained code language models for automated test case repair.
TaRGET treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z) - Generating and Evaluating Tests for K-12 Students with Language Model
Simulations: A Case Study on Sentence Reading Efficiency [45.6224547703717]
This study focuses on tests of silent sentence reading efficiency, used to assess students' reading ability over time.
We propose to fine-tune large language models (LLMs) to simulate how previous students would have responded to unseen items.
We show the generated tests closely correspond to the original test's difficulty and reliability based on crowdworker responses.
arXiv Detail & Related papers (2023-10-10T17:59:51Z) - Automatic Generation of Test Cases based on Bug Reports: a Feasibility
Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications.
Most testing procedures still rely on test cases written by humans to form test suites.
We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z) - A multi-case study of agile requirements engineering and the use of test
cases as requirements [5.71126361766062]
Test cases are commonly viewed as requirements and detailed requirements are documented as test cases.
The use of test cases as requirements poses both benefits and challenges when eliciting, validating, verifying, and managing requirements.
The identified variants of the practice of using test cases as requirements can be used to perform in-depth investigations into agile requirements engineering.
arXiv Detail & Related papers (2023-08-22T19:13:45Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.