Automatic High-Level Test Case Generation using Large Language Models
- URL: http://arxiv.org/abs/2503.17998v1
- Date: Sun, 23 Mar 2025 09:14:41 GMT
- Title: Automatic High-Level Test Case Generation using Large Language Models
- Authors: Navid Bin Hasan, Md. Ashraful Islam, Junaed Younus Khan, Sanjida Senjik, Anindya Iqbal,
- Abstract summary: Primary challenge is not writing test scripts but aligning testing efforts with business requirements.<n>We constructed a use-case dataset to train/fine-tune models for generating high-level test cases.<n>Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation.
- Score: 1.8136446064778242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explored the challenges practitioners face in software testing and proposed automated solutions to address these obstacles. We began with a survey of local software companies and 26 practitioners, revealing that the primary challenge is not writing test scripts but aligning testing efforts with business requirements. Based on these insights, we constructed a use-case $\rightarrow$ (high-level) test-cases dataset to train/fine-tune models for generating high-level test cases. High-level test cases specify what aspects of the software's functionality need to be tested, along with the expected outcomes. We evaluated large language models, such as GPT-4o, Gemini, LLaMA 3.1 8B, and Mistral 7B, where fine-tuning (the latter two) yields improved performance. A final (human evaluation) survey confirmed the effectiveness of these generated test cases. Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation to streamline development.
Related papers
- Acceptance Test Generation with Large Language Models: An Industrial Case Study [0.7874708385247353]
Large language model (LLM)-powered assistants are increasingly used for generating program code and unit tests.
This paper explores the use of LLMs for generating executable acceptance tests for web applications through a two-step process.
This two-step approach supports acceptance test-driven development, enhances tester control, and improves test quality.
arXiv Detail & Related papers (2025-04-09T19:33:38Z) - System Test Case Design from Requirements Specifications: Insights and Challenges of Using ChatGPT [1.9282110216621835]
This paper explores the effectiveness of using Large Language Models (LLMs) to generate test case designs from Software Requirements Specification (SRS) documents.<n>About 87 percent of the generated test cases were valid, with the remaining 13 percent either not applicable or redundant.
arXiv Detail & Related papers (2024-12-04T20:12:27Z) - Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists [59.08999823652293]
We propose SYNTHEVAL to generate a wide range of test types for a comprehensive evaluation of NLP models.
In the last stage, human experts investigate the challenging examples, manually design templates, and identify the types of failures the taskspecific models consistently exhibit.
We apply SYNTHEVAL to two classification tasks, sentiment analysis and toxic language detection, and show that our framework is effective in identifying weaknesses of strong models on these tasks.
arXiv Detail & Related papers (2024-08-30T17:41:30Z) - Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases.
We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z) - Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process.<n>We present TaRGET, a novel approach leveraging pre-trained code language models for automated test case repair.<n>TaRGET treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z) - Automatic Generation of Test Cases based on Bug Reports: a Feasibility
Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications.
Most testing procedures still rely on test cases written by humans to form test suites.
We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z) - Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles.
The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.