Quality attributes of test cases and test suites -- importance & challenges from practitioners' perspectives
- URL: http://arxiv.org/abs/2507.06343v1
- Date: Tue, 08 Jul 2025 19:09:27 GMT
- Title: Quality attributes of test cases and test suites -- importance & challenges from practitioners' perspectives
- Authors: Huynh Khanh Vi Tran, Nauman bin Ali, Michael Unterkalmsteiner, Jürgen Börstler, Panagiota Chatzipetrou,
- Abstract summary: We investigate practitioners' perceptions regarding the relative importance of quality attributes of test cases and test suites.<n>We identify common challenges that apply to the important attributes, namely inadequate definition, lack of useful metrics, lack of an established review process, and lack of external support.
- Score: 3.5128287143338626
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Context: The quality of the test suites and the constituent test cases significantly impacts confidence in software testing. While research has identified several quality attributes of test cases and test suites, there is a need for a better understanding of their relative importance in practice. Objective: We investigate practitioners' perceptions regarding the relative importance of quality attributes of test cases and test suites and the challenges they face in ensuring the perceived important quality attributes. Method: We conducted an industrial survey using a questionnaire based on the quality attributes identified in an extensive literature review. We used a sampling strategy that leverages LinkedIn to draw a large and heterogeneous sample of professionals with experience in software testing. Results: We collected 354 responses from practitioners with a wide range of experience. We found that the majority of practitioners rated Fault Detection, Usability, Maintainability, Reliability, and Coverage to be the most important quality attributes. Resource Efficiency, Reusability, and Simplicity received the most divergent opinions, which, according to our analysis, depend on the software-testing contexts. We identified common challenges that apply to the important attributes, namely inadequate definition, lack of useful metrics, lack of an established review process, and lack of external support. Conclusion: The findings point out where practitioners actually need further support with respect to achieving high-quality test cases and test suites under different software testing contexts. The findings can serve as a guideline for academic researchers when looking for research directions on the topic. The findings can also be used to encourage companies to provide more support to practitioners to achieve high-quality test cases and test suites.
Related papers
- Establishing Best Practices for Building Rigorous Agentic Benchmarks [94.69724201080155]
We show that many agentic benchmarks have issues in task setup or reward design.<n>Such issues can lead to under- or overestimation of agents' performance by up to 100% in relative terms.<n>We introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience.
arXiv Detail & Related papers (2025-07-03T17:35:31Z) - AI-Driven Tools in Modern Software Quality Assurance: An Assessment of Benefits, Challenges, and Future Directions [0.0]
The research aims to assess the benefits, challenges, and prospects of integrating modern AI-oriented tools into quality assurance processes.<n>The research demonstrates AI's transformative potential for QA but highlights the importance of a strategic approach to implementing these technologies.
arXiv Detail & Related papers (2025-06-19T20:22:47Z) - CodeContests+: High-Quality Test Case Generation for Competitive Programming [14.602111331209203]
We introduce an agent system that creates high-quality test cases for competitive programming problems.<n>We apply this system to the CodeContests dataset and propose a new version with improved test cases, named CodeContests+.<n>The results indicate that CodeContests+ achieves significantly higher accuracy than CodeContests, particularly with a notably higher True Positive Rate (TPR)
arXiv Detail & Related papers (2025-06-06T07:29:01Z) - TestAgent: An Adaptive and Intelligent Expert for Human Assessment [62.060118490577366]
We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
arXiv Detail & Related papers (2025-06-03T16:07:54Z) - QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation [13.202947148434333]
We introduce test item analysis, a method frequently used to assess test question quality, into QG evaluation.<n>We construct pairs of candidate questions that differ in quality across dimensions such as topic coverage, item difficulty, item discrimination, and distractor efficiency.<n>We propose a novel QG evaluation framework, QG-SMS, which leverages Large Language Model for Student Modeling and Simulation.
arXiv Detail & Related papers (2025-03-07T19:21:59Z) - CritiQ: Mining Data Quality Criteria from Human Preferences [70.35346554179036]
We introduce CritiQ, a novel data selection method that automatically mines criteria from human preferences for data quality.<n>CritiQ Flow employs a manager agent to evolve quality criteria and worker agents to make pairwise judgments.<n>We demonstrate the effectiveness of our method in the code, math, and logic domains.
arXiv Detail & Related papers (2025-02-26T16:33:41Z) - Assessing test artifact quality -- A tertiary study [1.7827643249624088]
We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts.
We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated.
arXiv Detail & Related papers (2024-02-14T19:31:57Z) - Test-Case Quality -- Understanding Practitioners' Perspectives [1.7827643249624088]
We present a quality model which consists of 11 test-case quality attributes.
We identify a misalignment in defining test-case quality among practitioners and between academia and industry.
arXiv Detail & Related papers (2023-09-28T19:10:01Z) - ExpertQA: Expert-Curated Questions and Attributed Answers [51.68314045809179]
We conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality.
We collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions.
The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.
arXiv Detail & Related papers (2023-09-14T16:54:34Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - A New Score for Adaptive Tests in Bayesian and Credal Networks [64.80185026979883]
A test is adaptive when its sequence and number of questions is dynamically tuned on the basis of the estimated skills of the taker.
We present an alternative family of scores, based on the mode of the posterior probabilities, and hence easier to explain.
arXiv Detail & Related papers (2021-05-25T20:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.