Related papers: Assessing test artifact quality -- A tertiary study

Assessing test artifact quality -- A tertiary study

URL: http://arxiv.org/abs/2402.09541v1
Date: Wed, 14 Feb 2024 19:31:57 GMT
Title: Assessing test artifact quality -- A tertiary study
Authors: Huynh Khanh Vi Tran, Michael Unterkalmsteiner, J\"urgen B\"orstler, Nauman bin Ali
Abstract summary: We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts. We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated.
Score: 1.7827643249624088
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspectives. Method: We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts. Results: We identified 49 relevant secondary studies. Of these 49 studies, less than half did some form of quality appraisal of the included primary studies and only 3 took into account the quality of the primary study when synthesizing the results. We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated. We also provide a comprehensive model of test case/suite quality with definitions for the quality attributes and measurements based on findings in the literature and ISO/IEC 25010:2011. Conclusion: The test artifact quality model presented in the paper can be used to support test artifact quality assessment and improvement initiatives in practice. Furtherm Information and Software Technology 139 (2021): 106620ore, the model can also be used as a framework for documenting context characteristics to make research results more accessible for research and practice.

Related papers

Quality attributes of test cases and test suites -- importance & challenges from practitioners' perspectives [3.5128287143338626]
We investigate practitioners' perceptions regarding the relative importance of quality attributes of test cases and test suites.<n>We identify common challenges that apply to the important attributes, namely inadequate definition, lack of useful metrics, lack of an established review process, and lack of external support.
arXiv Detail & Related papers (2025-07-08T19:09:27Z)
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation [13.202947148434333]
We introduce test item analysis, a method frequently used to assess test question quality, into QG evaluation. We construct pairs of candidate questions that differ in quality across dimensions such as topic coverage, item difficulty, item discrimination, and distractor efficiency. We propose a novel QG evaluation framework, QG-SMS, which leverages Large Language Model for Student Modeling and Simulation.
arXiv Detail & Related papers (2025-03-07T19:21:59Z)
CritiQ: Mining Data Quality Criteria from Human Preferences [70.35346554179036]
We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality. CritiQ Flow employs a manager agent to evolve quality criteria and worker agents to make pairwise judgments. We demonstrate the effectiveness of our method in the code, math, and logic domains.
arXiv Detail & Related papers (2025-02-26T16:33:41Z)
Requirements-Driven Automated Software Testing: A Systematic Review [13.67495800498868]
This study synthesizes the current state of REDAST research, highlights trends, and proposes future directions. This systematic literature review ( SLR) explores the landscape of REDAST by analyzing requirements input, transformation techniques, test outcomes, evaluation methods, and existing limitations.
arXiv Detail & Related papers (2025-02-25T23:13:09Z)
Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation. We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z)
Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR [3.0561992956541606]
We first present a comprehensive literature review of previous work, covering key facets of AI software testing processes. We then introduce a 3D classification model to systematically evaluate the image-based text extraction AI function. To evaluate the performance of our proposed AI software quality test, we propose four evaluation metrics to cover different aspects.
arXiv Detail & Related papers (2024-09-14T23:33:28Z)
Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification [0.0]
We employ the chi-square test to identify high-quality samples and compare the results with those obtained using low-quality samples. Our findings demonstrate that utilizing high-quality samples leads to improved performance with respect to all evaluated metrics.
arXiv Detail & Related papers (2024-05-28T12:47:43Z)
QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z)
A manual categorization of new quality issues on automatically-generated tests [0.8225289576465757]
We report on a manual analysis of an external dataset consisting of 2,340 automatically generated tests. We propose a taxonomy of 13 new quality issues grouped in four categories. We present eight recommendations that test generators may consider to improve the quality and usefulness of the automatically generated tests.
arXiv Detail & Related papers (2023-12-14T11:19:14Z)
A Novel Metric for Measuring Data Quality in Classification Applications (extended version) [0.0]
We introduce and explain a novel metric to measure data quality. This metric is based on the correlated evolution between the classification performance and the deterioration of data. We provide an interpretation of each criterion and examples of assessment levels.
arXiv Detail & Related papers (2023-12-13T11:20:09Z)
Test-Case Quality -- Understanding Practitioners' Perspectives [1.7827643249624088]
We present a quality model which consists of 11 test-case quality attributes. We identify a misalignment in defining test-case quality among practitioners and between academia and industry.
arXiv Detail & Related papers (2023-09-28T19:10:01Z)
Analyzing Dataset Annotation Quality Management in the Wild [63.07224587146207]
Even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts. While practices and guidelines regarding dataset creation projects exist, large-scale analysis has yet to be performed on how quality management is conducted.
arXiv Detail & Related papers (2023-07-16T21:22:40Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Image Quality Assessment in the Modern Age [53.19271326110551]
This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA) We will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli. Both hand-engineered and (deep) learning-based methods will be covered.
arXiv Detail & Related papers (2021-10-19T02:38:46Z)
Quality meets Diversity: A Model-Agnostic Framework for Computerized Adaptive Testing [60.38182654847399]
Computerized Adaptive Testing (CAT) is emerging as a promising testing application in many scenarios. We propose a novel framework, namely Model-Agnostic Adaptive Testing (MAAT) for CAT solution.
arXiv Detail & Related papers (2021-01-15T06:48:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.