Related papers: Generating High-Level Test Cases from Requirements using LLM: An Industry Study

Generating High-Level Test Cases from Requirements using LLM: An Industry Study

URL: http://arxiv.org/abs/2510.03641v1
Date: Sat, 04 Oct 2025 03:05:45 GMT
Title: Generating High-Level Test Cases from Requirements using LLM: An Industry Study
Authors: Satoshi Masuda, Satoshi Kouzawa, Kyousuke Sezai, Hidetoshi Suhara, Yasuaki Hiruta, Kunihiro Kudou,
Abstract summary: Currently, generating high-level test cases described in natural language from requirement documents is performed manually.<n>In some cases, retrieval-augmented generation (RAG) is employed for generating high-level test cases using Large Language Models (LLMs)<n>We propose a method for generating high-level (GHL) test cases from requirement documents using only prompts, without creating RAGs.
Score: 0.2257707034197163
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Currently, generating high-level test cases described in natural language from requirement documents is performed manually. In the industry, including companies specializing in software testing, there is a significant demand for the automatic generation of high-level test cases from requirement documents using Large Language Models (LLMs). Efforts to utilize LLMs for requirement analysis are underway. In some cases, retrieval-augmented generation (RAG) is employed for generating high-level test cases using LLMs. However, in practical applications, it is necessary to create a RAG tailored to the knowledge system of each specific application, which is labor-intensive. Moreover, when applying high-level test case generation as a prompt, there is no established method for instructing the generation of high-level test cases at a level applicable to other specifications without using RAG. It is required to establish a method for the automatic generation of high-level test cases that can be generalized across a wider range of requirement documents. In this paper, we propose a method for generating high-level (GHL) test cases from requirement documents using only prompts, without creating RAGs. In the proposed method, first, the requirement document is input into the LLM to generate test design techniques corresponding to the requirement document. Then, high-level test cases are generated for each of the generated test design techniques. Furthermore, we verify an evaluation method based on semantic similarity of the generated high-level test cases. In the experiments, we confirmed the method using datasets from Bluetooth and Mozilla, where requirement documents and high-level test cases are available, achieving macro-recall measurement of 0.81 and 0.37, respectively. We believe that the method is feasible for practical application in generating high-level test cases without using RAG.

Related papers

Enhancing LLM-Based Test Generation by Eliminating Covered Code [2.2566909388480743]
Large Language Models (LLMs) have shown promise in improving test generation.<n>We propose a scalable LLM-based unit test generation method.<n>Our approach outperforms state-of-the-art LLM-based and search-based methods.
arXiv Detail & Related papers (2026-02-25T15:16:43Z)
LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases [11.173694789846435]
Appropriate test case generation is critical in software testing.<n>Use-case descriptions are a popular method for capturing functional behaviors and interaction flows in a structured form.<n>We propose a new approach that automatically generates test cases from NL use-case descriptions.
arXiv Detail & Related papers (2025-12-06T11:19:37Z)
CodeChemist: Functional Knowledge Transfer for Low-Resource Code Generation via Test-Time Scaling [63.08126845138046]
We present CodeChemist, a framework for test-time scaling that enables functional knowledge transfer from high-resource to low-resource PLs.<n>Our experiments show that CodeChemist outperforms existing test-time scaling approaches.
arXiv Detail & Related papers (2025-10-01T04:33:53Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z)
A Tool for Test Case Scenarios Generation Using Large Language Models [3.9422957660677476]
This article centers on generating user requirements as epics and high-level user stories. It introduces a web-based software tool that employs an LLM-based agent and prompt engineering to automate the generation of test case scenarios.
arXiv Detail & Related papers (2024-06-11T07:26:13Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Automated Control Logic Test Case Generation using Large Language Models [13.273872261029608]
We propose a novel approach for the automatic generation of PLC test cases that queries a Large Language Model (LLM) Experiments with ten open-source function blocks from the OSCAT automation library showed that the approach is fast, easy to use, and can yield test cases with high statement coverage for low-to-medium complex programs.
arXiv Detail & Related papers (2024-05-03T06:09:21Z)
Automating REST API Postman Test Cases Using LLM [0.0]
This research paper is dedicated to the exploration and implementation of an automated approach to generate test cases using Large Language Models. The methodology integrates the use of Open AI to enhance the efficiency and effectiveness of test case generation. The model that is developed during the research is trained using manually collected postman test cases or instances for various Rest APIs.
arXiv Detail & Related papers (2024-04-16T15:53:41Z)
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z)
SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs) We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.