CiRA: An Open-Source Python Package for Automated Generation of Test
Case Descriptions from Natural Language Requirements
- URL: http://arxiv.org/abs/2310.08234v1
- Date: Thu, 12 Oct 2023 11:30:59 GMT
- Title: CiRA: An Open-Source Python Package for Automated Generation of Test
Case Descriptions from Natural Language Requirements
- Authors: Julian Frattini, Jannik Fischbach, Andreas Bauer
- Abstract summary: This paper presents a tool from the CiRA (Causality In Requirements Artifacts) initiative, which automatically processes conditional natural language requirements.
We evaluate the tool on a publicly available data set of 61 requirements from the requirements specification of the German Corona-Warn-App.
- Score: 1.3082545468017672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deriving acceptance tests from high-level, natural language requirements that
achieve full coverage is a major manual challenge at the interface between
requirements engineering and testing. Conditional requirements (e.g., "If A or
B then C.") imply causal relationships which - when extracted - allow to
generate these acceptance tests automatically. This paper presents a tool from
the CiRA (Causality In Requirements Artifacts) initiative, which automatically
processes conditional natural language requirements and generates a minimal set
of test case descriptions achieving full coverage. We evaluate the tool on a
publicly available data set of 61 requirements from the requirements
specification of the German Corona-Warn-App. The tool infers the correct test
variables in 84.5% and correct variable configurations in 92.3% of all cases,
which corroborates the feasibility of our approach.
Related papers
- From Requirements to Test Cases: An NLP-Based Approach for High-Performance ECU Test Case Automation [0.5249805590164901]
This study investigates the use of Natural Language Processing techniques to transform natural language requirements into structured test case specifications.
A dataset of 400 feature element documents was used to evaluate both approaches for extracting key elements such as signal names and values.
The Rule-Based method outperforms the NER method, achieving 95% accuracy for more straightforward requirements with single signals.
arXiv Detail & Related papers (2025-05-01T14:23:55Z) - LLM-based Unit Test Generation for Dynamically-Typed Programs [16.38145000434927]
TypeTest is a novel framework that enhances type correctness in test generation through a vector-based Retrieval-Augmented Generation system.
In an evaluation on 125 real-world Python modules, TypeTest achieved an average statement coverage of 86.6% and branch coverage of 76.8%, outperforming state-of-theart tools by 5.4% and 9.3%, respectively.
arXiv Detail & Related papers (2025-03-18T08:07:17Z) - EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [54.354203142828084]
We present the task of equivalence checking as a new way to evaluate the code reasoning abilities of large language models.
We introduce EquiBench, a dataset of 2400 program pairs spanning four programming languages and six equivalence categories.
Our evaluation of 17 state-of-the-art LLMs shows that OpenAI o3-mini achieves the highest overall accuracy of 78.0%.
arXiv Detail & Related papers (2025-02-18T02:54:25Z) - Generating Test Scenarios from NL Requirements using Retrieval-Augmented LLMs: An Industrial Study [5.179738379203527]
This paper presents an automated approach (RAGTAG) for test scenario generation using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs)
We evaluate RAGTAG on two industrial projects from Austrian Post with bilingual requirements in German and English.
arXiv Detail & Related papers (2024-04-19T10:27:40Z) - Requirements Satisfiability with In-Context Learning [1.747623282473278]
Language models that can learn a task at an inference time, called in-context learning (ICL), show increasing promise in natural language tasks.
In this paper, we apply ICL to a design evaluation of satisfaction arguments, which describe how a requirement is satisfied by a system specification and associated knowledge.
The approach builds on three prompt design patterns, including augmented generation, prompt tuning, and chain-of-thought prompting.
arXiv Detail & Related papers (2024-04-19T01:58:24Z) - Natural Language Requirements Testability Measurement Based on Requirement Smells [1.1663475941322277]
Testable requirements help prevent failures, reduce maintenance costs, and make it easier to perform acceptance tests.
No automatic approach for measuring requirements testability has been proposed based on the requirements smells.
This paper presents a mathematical model to evaluate and rank the natural language requirements testability based on an extensive set of nine requirements smells.
arXiv Detail & Related papers (2024-03-26T08:19:29Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Fine-Tuning Language Models Using Formal Methods Feedback [53.24085794087253]
We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems.
The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions.
The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
arXiv Detail & Related papers (2023-10-27T16:24:24Z) - Eliciting Human Preferences with Language Models [56.68637202313052]
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts.
We propose to use *LMs themselves* to guide the task specification process.
We study GATE in three domains: email validation, content recommendation, and moral reasoning.
arXiv Detail & Related papers (2023-10-17T21:11:21Z) - Automated Smell Detection and Recommendation in Natural Language
Requirements [8.672583050502496]
Paska is a tool that takes as input any natural language (NL) requirements.
It automatically detects quality problems as smells in the requirements, and offers recommendations to improve their quality.
arXiv Detail & Related papers (2023-05-11T19:01:25Z) - Intergenerational Test Generation for Natural Language Processing
Applications [16.63835131985415]
We propose an automated test generation method for detecting erroneous behaviors of various NLP applications.
We implement this method into NLPLego, which is designed to fully exploit the potential of seed sentences.
NLPLego successfully detects 1,732, 5301, and 261,879 incorrect behaviors with around 95.7% precision in three tasks.
arXiv Detail & Related papers (2023-02-21T07:57:59Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - PRover: Proof Generation for Interpretable Reasoning over Rules [81.40404921232192]
We propose a transformer-based model that answers binary questions over rule-bases and generates the corresponding proofs.
Our model learns to predict nodes and edges corresponding to proof graphs in an efficient constrained training paradigm.
We conduct experiments on synthetic, hand-authored, and human-paraphrased rule-bases to show promising results for QA and proof generation.
arXiv Detail & Related papers (2020-10-06T15:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.