Test case quality: an empirical study on belief and evidence
- URL: http://arxiv.org/abs/2307.06410v1
- Date: Wed, 12 Jul 2023 19:02:48 GMT
- Title: Test case quality: an empirical study on belief and evidence
- Authors: Daniel Lucr\'edio, Auri Marcelo Rizzo Vincenzi, Eduardo Santana de
Almeida, Iftekhar Ahmed
- Abstract summary: We investigate eight hypotheses regarding what constitutes a good test case.
Despite our best efforts, we were unable to find evidence that supports these beliefs.
- Score: 8.475270520855332
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Software testing is a mandatory activity in any serious software development
process, as bugs are a reality in software development. This raises the
question of quality: good tests are effective in finding bugs, but until a test
case actually finds a bug, its effectiveness remains unknown. Therefore,
determining what constitutes a good or bad test is necessary. This is not a
simple task, and there are a number of studies that identify different
characteristics of a good test case. A previous study evaluated 29 hypotheses
regarding what constitutes a good test case, but the findings are based on
developers' beliefs, which are subjective and biased. In this paper we
investigate eight of these hypotheses, through an extensive empirical study
based on open software repositories. Despite our best efforts, we were unable
to find evidence that supports these beliefs. This indicates that, although
these hypotheses represent good software engineering advice, they do not
necessarily mean that they are enough to provide the desired outcome of good
testing code.
Related papers
- Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection? [50.53312866647302]
HateCheck is a suite for testing fine-grained model functionalities on synthesized data.
We propose GPT-HateCheck, a framework to generate more diverse and realistic functional tests from scratch.
Crowd-sourced annotation demonstrates that the generated test cases are of high quality.
arXiv Detail & Related papers (2024-02-23T10:02:01Z) - Automatic Generation of Test Cases based on Bug Reports: a Feasibility
Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications.
Most testing procedures still rely on test cases written by humans to form test suites.
We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z) - Test-Case Quality -- Understanding Practitioners' Perspectives [1.7827643249624088]
We present a quality model which consists of 11 test-case quality attributes.
We identify a misalignment in defining test-case quality among practitioners and between academia and industry.
arXiv Detail & Related papers (2023-09-28T19:10:01Z) - A Survey on What Developers Think About Testing [13.086283144520513]
We conducted a comprehensive survey with 21 questions aimed at assessing developers' current engagement with testing.
We uncover reasons that positively and negatively impact developers' motivation to test.
One approach emerging from the responses to mitigate these negative factors is by providing better recognition for developers' testing efforts.
arXiv Detail & Related papers (2023-09-03T12:18:41Z) - When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP [23.30735117217225]
We present a case study in which we identify and fix three bugs in widely used implementations of the state-of-the-art Conformer architecture.
We propose a Code-quality Checklist and release pangoliNN, a library dedicated to testing neural models.
arXiv Detail & Related papers (2023-03-28T17:28:52Z) - SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video
Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems.
We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub.
The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z) - The Unpopularity of the Software Tester Role among Software
Practitioners: A Case Study [10.028628621669293]
This work attempts to understand the motivation/de-motivation of software practitioners to take up and sustain testing careers.
One hundred and forty four software practitioners from several Cuban software insti-tutes were surveyed.
Individuals were asked the PROs (advantages or motiva-tors) and CONs (disadvantages or de-motivators) of taking up a career in soft-ware testing and their chances of doing so.
arXiv Detail & Related papers (2020-07-16T14:52:36Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.