Related papers: Test-Case Quality -- Understanding Practitioners' Perspectives

Test-Case Quality -- Understanding Practitioners' Perspectives

URL: http://arxiv.org/abs/2309.16801v1
Date: Thu, 28 Sep 2023 19:10:01 GMT
Title: Test-Case Quality -- Understanding Practitioners' Perspectives
Authors: Huynh Khanh Vi Tran, Nauman Bin Ali, J\"urgen B\"orstler, Michael Unterkalmsteiner
Abstract summary: We present a quality model which consists of 11 test-case quality attributes. We identify a misalignment in defining test-case quality among practitioners and between academia and industry.
Score: 1.7827643249624088
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted semi-structured interviews with professional developers, testers and test architects from a multinational software company in Sweden. Before the interviews, we asked participants for actual test cases (written in natural language) that they perceive as good, normal, and bad respectively together with rationales for their assessment. We also compared their opinions on shared test cases and contrasted their views with the relevant literature. Results: We present a quality model which consists of 11 test-case quality attributes. We also identify a misalignment in defining test-case quality among practitioners and between academia and industry, along with suggestions for improving test-case quality in industry. Conclusion: The results show that practitioners' background, including roles and working experience, are critical dimensions of how test-case quality is defined and assessed.

Related papers

Quality attributes of test cases and test suites -- importance & challenges from practitioners' perspectives [3.5128287143338626]
We investigate practitioners' perceptions regarding the relative importance of quality attributes of test cases and test suites.<n>We identify common challenges that apply to the important attributes, namely inadequate definition, lack of useful metrics, lack of an established review process, and lack of external support.
arXiv Detail & Related papers (2025-07-08T19:09:27Z)
CodeContests+: High-Quality Test Case Generation for Competitive Programming [14.602111331209203]
We introduce an agent system that creates high-quality test cases for competitive programming problems.<n>We apply this system to the CodeContests dataset and propose a new version with improved test cases, named CodeContests+.<n>The results indicate that CodeContests+ achieves significantly higher accuracy than CodeContests, particularly with a notably higher True Positive Rate (TPR)
arXiv Detail & Related papers (2025-06-06T07:29:01Z)
TestAgent: An Adaptive and Intelligent Expert for Human Assessment [62.060118490577366]
We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
arXiv Detail & Related papers (2025-06-03T16:07:54Z)
Automatic High-Level Test Case Generation using Large Language Models [1.8136446064778242]
Primary challenge is not writing test scripts but aligning testing efforts with business requirements. We constructed a use-case dataset to train/fine-tune models for generating high-level test cases. Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation.
arXiv Detail & Related papers (2025-03-23T09:14:41Z)
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation [13.202947148434333]
We introduce test item analysis, a method frequently used to assess test question quality, into QG evaluation. We construct pairs of candidate questions that differ in quality across dimensions such as topic coverage, item difficulty, item discrimination, and distractor efficiency. We propose a novel QG evaluation framework, QG-SMS, which leverages Large Language Model for Student Modeling and Simulation.
arXiv Detail & Related papers (2025-03-07T19:21:59Z)
CritiQ: Mining Data Quality Criteria from Human Preferences [70.35346554179036]
We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality. CritiQ Flow employs a manager agent to evolve quality criteria and worker agents to make pairwise judgments. We demonstrate the effectiveness of our method in the code, math, and logic domains.
arXiv Detail & Related papers (2025-02-26T16:33:41Z)
Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation. We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z)
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models [67.89204055004028]
Large Vision-Language Models (LVLMs) have been plagued by the issue of hallucination. Previous works have proposed a series of benchmarks featuring different types of tasks and evaluation metrics. We propose a Hallucination benchmark Quality Measurement framework (HQM) to assess the reliability and validity of existing hallucination benchmarks.
arXiv Detail & Related papers (2024-06-24T20:08:07Z)
Elevating Software Quality in Agile Environments: The Role of Testing Professionals in Unit Testing [0.0]
Testing is an essential quality activity in the software development process. This paper explores the participation of test engineers in unit testing within an industrial context.
arXiv Detail & Related papers (2024-03-20T00:41:49Z)
QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z)
Assessing test artifact quality -- A tertiary study [1.7827643249624088]
We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts. We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated.
arXiv Detail & Related papers (2024-02-14T19:31:57Z)
Automated Test Case Repair Using Language Models [0.5708902722746041]
Unrepaired broken test cases can degrade test suite quality and disrupt the software development process. We present TaRGet, a novel approach leveraging pre-trained code language models for automated test case repair. TaRGet treats test repair as a language translation task, employing a two-step process to fine-tune a language model.
arXiv Detail & Related papers (2024-01-12T18:56:57Z)
A Survey on What Developers Think About Testing [13.086283144520513]
We conducted a comprehensive survey with 21 questions aimed at assessing developers' current engagement with testing. We uncover reasons that positively and negatively impact developers' motivation to test. One approach emerging from the responses to mitigate these negative factors is by providing better recognition for developers' testing efforts.
arXiv Detail & Related papers (2023-09-03T12:18:41Z)
Test case quality: an empirical study on belief and evidence [8.475270520855332]
We investigate eight hypotheses regarding what constitutes a good test case. Despite our best efforts, we were unable to find evidence that supports these beliefs.
arXiv Detail & Related papers (2023-07-12T19:02:48Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study [63.27346930921658]
ChatGPT is capable of evaluating text quality effectively from various perspectives without reference. The Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches.
arXiv Detail & Related papers (2023-04-03T05:29:58Z)
Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text. The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.