Related papers: System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT

URL: http://arxiv.org/abs/2412.03693v1
Date: Wed, 04 Dec 2024 20:12:27 GMT
Title: System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT
Authors: Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, Pankaj Jalote,
Abstract summary: This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents.<n>About 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant.
Score: 1.9282110216621835
License: http://creativecommons.org/licenses/by/4.0/
Abstract: System testing is essential in any software development project to ensure that the final products meet the requirements. Creating comprehensive test cases for system testing from requirements is often challenging and time-consuming. This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents. In this study, we collected the SRS documents of five software engineering projects containing functional and non-functional requirements, which were implemented, tested, and delivered by respective developer teams. For generating test case designs, we used ChatGPT-4o Turbo model. We employed prompt-chaining, starting with an initial context-setting prompt, followed by prompts to generate test cases for each use case. We assessed the quality of the generated test case designs through feedback from the same developer teams as mentioned above. Our experiments show that about 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant. Notably, 15 percent of the valid test cases were previously not considered by developers in their testing. We also tasked ChatGPT with identifying redundant test cases, which were subsequently validated by the respective developers to identify false positives and to uncover any redundant test cases that may have been missed by the developers themselves. This study highlights the potential of leveraging LLMs for test generation from the Requirements Specification document and also for assisting developers in quickly identifying and addressing redundancies, ultimately improving test suite quality and efficiency of the testing procedure.

Related papers

Acceptance Test Generation with Large Language Models: An Industrial Case Study [0.7874708385247353]
Large language model (LLM)-powered assistants are increasingly used for generating program code and unit tests. This paper explores the use of LLMs for generating executable acceptance tests for web applications through a two-step process. This two-step approach supports acceptance test-driven development, enhances tester control, and improves test quality.
arXiv Detail & Related papers (2025-04-09T19:33:38Z)
Automatic High-Level Test Case Generation using Large Language Models [1.8136446064778242]
Primary challenge is not writing test scripts but aligning testing efforts with business requirements. We constructed a use-case dataset to train/fine-tune models for generating high-level test cases. Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation.
arXiv Detail & Related papers (2025-03-23T09:14:41Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
Historical Test-time Prompt Tuning for Vision Foundation Models [99.96912440427192]
HisTPT is a Historical Test-time Prompt Tuning technique that memorizes the useful knowledge of the learnt test samples. HisTPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks.
arXiv Detail & Related papers (2024-10-27T06:03:15Z)
Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases. We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults. Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z)
Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process. We present TaRGet, a novel approach leveraging pre-trained code language models for automated test case repair. TaRGet treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z)
Towards General Error Diagnosis via Behavioral Testing in Machine Translation [48.108393938462974]
This paper proposes a new framework for conducting behavioral testing of machine translation (MT) systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation approach. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results.
arXiv Detail & Related papers (2023-10-20T09:06:41Z)
Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency [45.6224547703717]
This study focuses on tests of silent sentence reading efficiency, used to assess students' reading ability over time. We propose to fine-tune large language models (LLMs) to simulate how previous students would have responded to unseen items. We show the generated tests closely correspond to the original test's difficulty and reliability based on crowdworker responses.
arXiv Detail & Related papers (2023-10-10T17:59:51Z)
Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications. Most testing procedures still rely on test cases written by humans to form test suites. We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z)
A multi-case study of agile requirements engineering and the use of test cases as requirements [5.71126361766062]
Test cases are commonly viewed as requirements and detailed requirements are documented as test cases. The use of test cases as requirements poses both benefits and challenges when eliciting, validating, verifying, and managing requirements. The identified variants of the practice of using test cases as requirements can be used to perform in-depth investigations into agile requirements engineering.
arXiv Detail & Related papers (2023-08-22T19:13:45Z)
CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases. CodeT executes the code solutions using the generated test cases, and then chooses the best solution. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z)
Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation. In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.