A manual categorization of new quality issues on automatically-generated
tests
- URL: http://arxiv.org/abs/2312.08826v1
- Date: Thu, 14 Dec 2023 11:19:14 GMT
- Title: A manual categorization of new quality issues on automatically-generated
tests
- Authors: Geraldine Galindo-Gutierrez, Narea Maxilimiliano, Blanco Alison
Fernandez, Nicolas Anquetil, Alcocer Juan Pablo Sandoval
- Abstract summary: We report on a manual analysis of an external dataset consisting of 2,340 automatically generated tests.
We propose a taxonomy of 13 new quality issues grouped in four categories.
We present eight recommendations that test generators may consider to improve the quality and usefulness of the automatically generated tests.
- Score: 0.8225289576465757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diverse studies have analyzed the quality of automatically generated test
cases by using test smells as the main quality attribute. But recent work
reported that generated tests may suffer a number of quality issues not
necessarily considered in previous studies. Little is known about these issues
and their frequency within generated tests. In this paper, we report on a
manual analysis of an external dataset consisting of 2,340 automatically
generated tests. This analysis aimed at detecting new quality issues, not
covered by past recognized test smells. We use thematic analysis to group and
categorize the new quality issues found. As a result, we propose a taxonomy of
13 new quality issues grouped in four categories. We also report on the
frequency of these new quality issues within the dataset and present eight
recommendations that test generators may consider to improve the quality and
usefulness of the automatically generated tests.
Related papers
- An Automatic Question Usability Evaluation Toolkit [1.2499537119440245]
evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability.
We introduce SAQUET, an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs.
With an accuracy rate of over 94%, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.
arXiv Detail & Related papers (2024-05-30T23:04:53Z) - Assessing test artifact quality -- A tertiary study [1.7827643249624088]
We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts.
We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated.
arXiv Detail & Related papers (2024-02-14T19:31:57Z) - Enriching Automatic Test Case Generation by Extracting Relevant Test
Inputs from Bug Reports [8.85274953789614]
name is a technique for exploring bug reports to identify input values that can be fed to automatic test generation tools.
For Defects4J projects, our study has shown that name successfully extracted 68.68% of relevant inputs when using regular expression in its approach.
arXiv Detail & Related papers (2023-12-22T18:19:33Z) - BAND-2k: Banding Artifact Noticeable Database for Banding Detection and
Quality Assessment [52.1640725073183]
Banding, also known as staircase-like contours, frequently occurs in flat areas of images/videos processed by the compression or quantization algorithms.
We build the largest banding IQA database so far, named Banding Artifact Noticeable Database (BAND-2k), which consists of 2,000 banding images.
A dual convolutional neural network is employed to concurrently learn the feature representation from the high-frequency and low-frequency maps.
arXiv Detail & Related papers (2023-11-29T15:56:31Z) - Test-Case Quality -- Understanding Practitioners' Perspectives [1.7827643249624088]
We present a quality model which consists of 11 test-case quality attributes.
We identify a misalignment in defining test-case quality among practitioners and between academia and industry.
arXiv Detail & Related papers (2023-09-28T19:10:01Z) - Manual Tests Do Smell! Cataloging and Identifying Natural Language Test
Smells [1.43994708364763]
Test smells indicate potential problems in the design and implementation of automated software tests.
This study aims to contribute to a catalog of test smells for manual tests.
arXiv Detail & Related papers (2023-08-02T19:05:36Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel
Attribute Synthesis [65.74825840440504]
We propose Zero Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge.
Our proposed method is able to synthesize the detectors of novel attributes in a zero-shot learning manner.
With using only 32 seen attributes on the Caltech-UCSD Birds-200-2011 dataset, our proposed method is able to synthesize other 207 novel attributes.
arXiv Detail & Related papers (2021-11-28T15:45:54Z) - A New Score for Adaptive Tests in Bayesian and Credal Networks [64.80185026979883]
A test is adaptive when its sequence and number of questions is dynamically tuned on the basis of the estimated skills of the taker.
We present an alternative family of scores, based on the mode of the posterior probabilities, and hence easier to explain.
arXiv Detail & Related papers (2021-05-25T20:35:42Z) - Generative Models are Unsupervised Predictors of Page Quality: A
Colossal-Scale Study [86.62171568318716]
Large generative language models such as GPT-2 are well-known for their ability to generate text.
We show that unsupervised predictors of "page quality" emerge, able to detect low quality content without any training.
We conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
arXiv Detail & Related papers (2020-08-17T07:13:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.