Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization
- URL: http://arxiv.org/abs/2504.04557v1
- Date: Sun, 06 Apr 2025 17:14:09 GMT
- Title: Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization
- Authors: Md. Ashraf Uddin, Shaowei Wang, An Ran Chen, Tse-Hsun, Chen,
- Abstract summary: This study is the first empirical study on early test termination due to assertion failure.<n>We investigated 207 versions of 6 open-source projects.<n>Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
- Score: 48.22524837906857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An assertion is commonly used to validate the expected programs behavior (e.g., if the returned value of a method equals an expected value) in software testing. Although it is a recommended practice to use only one assertion in a single test to avoid code smells (e.g., Assertion Roulette), it is common to have multiple assertions in a single test. One issue with tests that have multiple assertions is that when the test fails at an early assertion (not the last one), the test will terminate at that point, and the remaining testing code will not be executed. This, in turn, can potentially reduce the code coverage and the performance of techniques that rely on code coverage information (e.g., spectrum-based fault localization). We refer to such a scenario as early test termination. Understanding the impact of early test termination on test coverage is important for software testing and debugging, particularly for the techniques that rely on coverage information obtained from the testing. We conducted the first empirical study on early test termination due to assertion failure (i.e., early test termination) by investigating 207 versions of 6 open-source projects. We found that a nonnegligible portion of the failed tests (19.1%) is early terminated due to assertion failure. Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization. For instance, after eliminating early test termination, the line/branch coverage is improved in 55% of the studied versions, and improves the performance of two popular SBFL techniques Ochiai and Tarantula by 15.1% and 10.7% compared to the original setting (without eliminating early test termination) in terms of MFR, respectively.
Related papers
- TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.<n>It covers initial tests authoring, test suite completion, and code coverage improvements.<n>We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code.
We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z) - TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration [7.509927117191286]
Large language models (LLMs) have demonstrated remarkable capabilities in generating unit test cases.<n>We propose TestART, a novel unit test generation method.<n>TestART improves LLM-based unit testing via co-evolution of automated generation and repair iteration.
arXiv Detail & Related papers (2024-08-06T10:52:41Z) - Taming Timeout Flakiness: An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes.
Test timeouts are one contributing factor to such flaky test failures.
Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
arXiv Detail & Related papers (2024-02-07T20:01:41Z) - Align Your Prompts: Test-Time Prompting with Distribution Alignment for
Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain.
Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Effective Test Generation Using Pre-trained Large Language Models and
Mutation Testing [13.743062498008555]
We introduce MuTAP for improving the effectiveness of test cases generated by Large Language Models (LLMs) in terms of revealing bugs.
MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs)
Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets.
arXiv Detail & Related papers (2023-08-31T08:48:31Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Test2Vec: An Execution Trace Embedding for Test Case Prioritization [12.624724734296342]
Execution traces of test cases can be a good alternative to abstract their behavior for automated testing tasks.
We propose a novel embedding approach, Test2Vec, that maps test execution traces to a latent space.
Results show that our proposed TP improves best alternatives by 41.80% in terms of the median normalized rank of the first failing test case.
arXiv Detail & Related papers (2022-06-28T20:38:36Z) - DeepOrder: Deep Learning for Test Case Prioritization in Continuous
Integration Testing [6.767885381740952]
This work introduces DeepOrder, a deep learning-based model that works on the basis of regression machine learning.
DeepOrder ranks test cases based on the historical record of test executions from any number of previous test cycles.
We experimentally show that deep neural networks, as a simple regression model, can be efficiently used for test case prioritization in continuous integration testing.
arXiv Detail & Related papers (2021-10-14T15:10:38Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.