Manual Tests Do Smell! Cataloging and Identifying Natural Language Test
Smells
- URL: http://arxiv.org/abs/2308.01386v1
- Date: Wed, 2 Aug 2023 19:05:36 GMT
- Title: Manual Tests Do Smell! Cataloging and Identifying Natural Language Test
Smells
- Authors: Elvys Soares, Manoel Aranda, Naelson Oliveira, M\'arcio Ribeiro, Rohit
Gheyi, Emerson Souza, Ivan Machado, Andr\'e Santos, Baldoino Fonseca, Rodrigo
Bonif\'acio
- Abstract summary: Test smells indicate potential problems in the design and implementation of automated software tests.
This study aims to contribute to a catalog of test smells for manual tests.
- Score: 1.43994708364763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Test smells indicate potential problems in the design and
implementation of automated software tests that may negatively impact test code
maintainability, coverage, and reliability. When poorly described, manual tests
written in natural language may suffer from related problems, which enable
their analysis from the point of view of test smells. Despite the possible
prejudice to manually tested software products, little is known about test
smells in manual tests, which results in many open questions regarding their
types, frequency, and harm to tests written in natural language. Aims:
Therefore, this study aims to contribute to a catalog of test smells for manual
tests. Method: We perform a two-fold empirical strategy. First, an exploratory
study in manual tests of three systems: the Ubuntu Operational System, the
Brazilian Electronic Voting Machine, and the User Interface of a large
smartphone manufacturer. We use our findings to propose a catalog of eight test
smells and identification rules based on syntactical and morphological text
analysis, validating our catalog with 24 in-company test engineers. Second,
using our proposals, we create a tool based on Natural Language Processing
(NLP) to analyze the subject systems' tests, validating the results. Results:
We observed the occurrence of eight test smells. A survey of 24 in-company test
professionals showed that 80.7% agreed with our catalog definitions and
examples. Our NLP-based tool achieved a precision of 92%, recall of 95%, and
f-measure of 93.5%, and its execution evidenced 13,169 occurrences of our
cataloged test smells in the analyzed systems. Conclusion: We contribute with a
catalog of natural language test smells and novel detection strategies that
better explore the capabilities of current NLP mechanisms with promising
results and reduced effort to analyze tests written in different idioms.
Related papers
- xNose: A Test Smell Detector for C# [0.0]
Test smells, similar to code smells, can negatively impact both the test code and the production code being tested.
Despite extensive research on test smells in languages like Java, Scala, and Python, automated tools for detecting test smells in C# are lacking.
arXiv Detail & Related papers (2024-05-07T07:10:42Z) - A Catalog of Transformations to Remove Smells From Natural Language Tests [1.260984934917191]
Test smells can pose difficulties during testing activities, such as poor maintainability, non-deterministic behavior, and incomplete verification.
This paper introduces a catalog of transformations designed to remove seven natural language test smells and a companion tool implemented using Natural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2024-04-25T19:23:24Z) - GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - Towards General Error Diagnosis via Behavioral Testing in Machine
Translation [48.108393938462974]
This paper proposes a new framework for conducting behavioral testing of machine translation (MT) systems.
The core idea of BTPGBT is to employ a novel bilingual translation pair generation approach.
Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results.
arXiv Detail & Related papers (2023-10-20T09:06:41Z) - Generating and Evaluating Tests for K-12 Students with Language Model
Simulations: A Case Study on Sentence Reading Efficiency [45.6224547703717]
This study focuses on tests of silent sentence reading efficiency, used to assess students' reading ability over time.
We propose to fine-tune large language models (LLMs) to simulate how previous students would have responded to unseen items.
We show the generated tests closely correspond to the original test's difficulty and reliability based on crowdworker responses.
arXiv Detail & Related papers (2023-10-10T17:59:51Z) - Effective Test Generation Using Pre-trained Large Language Models and
Mutation Testing [13.743062498008555]
We introduce MuTAP for improving the effectiveness of test cases generated by Large Language Models (LLMs) in terms of revealing bugs.
MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs)
Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets.
arXiv Detail & Related papers (2023-08-31T08:48:31Z) - Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing
Perspective [63.92197404447808]
Large language models (LLMs) have shown some human-like cognitive abilities.
We propose an adaptive testing framework for LLM evaluation.
This approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Machine Learning-Based Test Smell Detection [17.957877801382413]
Test smells are symptoms of sub-optimal design choices adopted when developing test cases.
We propose the design and experimentation of a novel test smell detection approach based on machine learning to detect four test smells.
arXiv Detail & Related papers (2022-08-16T07:33:15Z) - On the use of test smells for prediction of flaky tests [0.0]
flaky tests hamper the evaluation of test results and can increase costs.
Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting.
We investigate the use of test smells as predictors of flaky tests.
arXiv Detail & Related papers (2021-08-26T13:21:55Z) - Empowering Language Understanding with Counterfactual Reasoning [141.48592718583245]
We propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples.
In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples.
arXiv Detail & Related papers (2021-06-06T06:36:52Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.