Test-Case Quality -- Understanding Practitioners' Perspectives
- URL: http://arxiv.org/abs/2309.16801v1
- Date: Thu, 28 Sep 2023 19:10:01 GMT
- Title: Test-Case Quality -- Understanding Practitioners' Perspectives
- Authors: Huynh Khanh Vi Tran, Nauman Bin Ali, J\"urgen B\"orstler, Michael
Unterkalmsteiner
- Abstract summary: We present a quality model which consists of 11 test-case quality attributes.
We identify a misalignment in defining test-case quality among practitioners and between academia and industry.
- Score: 1.7827643249624088
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Background: Test-case quality has always been one of the major concerns in
software testing. To improve test-case quality, it is important to better
understand how practitioners perceive the quality of test-cases. Objective:
Motivated by that need, we investigated how practitioners define test-case
quality and which aspects of test-cases are important for quality assessment.
Method: We conducted semi-structured interviews with professional developers,
testers and test architects from a multinational software company in Sweden.
Before the interviews, we asked participants for actual test cases (written in
natural language) that they perceive as good, normal, and bad respectively
together with rationales for their assessment. We also compared their opinions
on shared test cases and contrasted their views with the relevant literature.
Results: We present a quality model which consists of 11 test-case quality
attributes. We also identify a misalignment in defining test-case quality among
practitioners and between academia and industry, along with suggestions for
improving test-case quality in industry. Conclusion: The results show that
practitioners' background, including roles and working experience, are critical
dimensions of how test-case quality is defined and assessed.
Related papers
- Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation.
We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z) - Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models [67.89204055004028]
Large Vision-Language Models (LVLMs) have been plagued by the issue of hallucination.
Previous works have proposed a series of benchmarks featuring different types of tasks and evaluation metrics.
We propose a Hallucination benchmark Quality Measurement framework (HQM) to assess the reliability and validity of existing hallucination benchmarks.
arXiv Detail & Related papers (2024-06-24T20:08:07Z) - Elevating Software Quality in Agile Environments: The Role of Testing Professionals in Unit Testing [0.0]
Testing is an essential quality activity in the software development process.
This paper explores the participation of test engineers in unit testing within an industrial context.
arXiv Detail & Related papers (2024-03-20T00:41:49Z) - QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.
In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value.
We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z) - Assessing test artifact quality -- A tertiary study [1.7827643249624088]
We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts.
We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated.
arXiv Detail & Related papers (2024-02-14T19:31:57Z) - Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process.
We present TaRGet, a novel approach leveraging pre-trained code language models for automated test case repair.
TaRGet treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z) - A Survey on What Developers Think About Testing [13.086283144520513]
We conducted a comprehensive survey with 21 questions aimed at assessing developers' current engagement with testing.
We uncover reasons that positively and negatively impact developers' motivation to test.
One approach emerging from the responses to mitigate these negative factors is by providing better recognition for developers' testing efforts.
arXiv Detail & Related papers (2023-09-03T12:18:41Z) - Test case quality: an empirical study on belief and evidence [8.475270520855332]
We investigate eight hypotheses regarding what constitutes a good test case.
Despite our best efforts, we were unable to find evidence that supports these beliefs.
arXiv Detail & Related papers (2023-07-12T19:02:48Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Exploring the Use of Large Language Models for Reference-Free Text
Quality Evaluation: An Empirical Study [63.27346930921658]
ChatGPT is capable of evaluating text quality effectively from various perspectives without reference.
The Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches.
arXiv Detail & Related papers (2023-04-03T05:29:58Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.