Related papers: Taming Timeout Flakiness: An Empirical Study of SAP HANA

Taming Timeout Flakiness: An Empirical Study of SAP HANA

URL: http://arxiv.org/abs/2402.05223v3
Date: Sat, 28 Sep 2024 11:12:40 GMT
Title: Taming Timeout Flakiness: An Empirical Study of SAP HANA
Authors: Alexander Berndt, Sebastian Baltes, Thomas Bach,
Abstract summary: Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes. Test timeouts are one contributing factor to such flaky test failures. Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
Score: 47.29324864511411
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regression testing aims to prevent code changes from breaking existing features. Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes, thus providing an ambiguous signal. Test timeouts are one contributing factor to such flaky test failures. With the goal of reducing test flakiness in SAP HANA, we empirically study the impact of test timeouts on flakiness in system tests. We evaluate different approaches to automatically adjust timeout values, assessing their suitability for reducing execution time costs and improving build turnaround times. We collect metadata on SAP HANA's test executions by repeatedly executing tests on the same code revision over a period of six months. We analyze the test flakiness rate, investigate the evolution of test timeout values, and evaluate different approaches for optimizing timeout values. The test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions. Test timeouts account for 70% of flaky test failures. Developers typically react to flaky timeouts by manually increasing timeout values or splitting long-running tests. However, manually adjusting timeout values is a tedious task. Our approach for timeout optimization reduces timeout-related flaky failures by 80% and reduces the overall median timeout value by 25%, i.e., blocked tests are identified faster. Test timeouts are a major contributing factor to flakiness in system tests. It is challenging for developers to effectively mitigate this problem manually. Our technique for optimizing timeout values reduces flaky failures while minimizing test costs. Practitioners working on large-scale industrial software systems can use our findings to increase the effectiveness of their system tests while reducing the burden on developers to manually maintain appropriate timeout values.

Related papers

Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure. We investigated 207 versions of 6 open-source projects. Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z)
Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code. We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z)
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay [76.06127233986663]
Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. This paper pays attention to the problem that conducts both sample recognition and outlier rejection during inference while outliers exist. We propose a new approach called STAble Memory rePlay (STAMP), which performs optimization over a stable memory bank instead of the risky mini-batch.
arXiv Detail & Related papers (2024-07-22T16:25:41Z)
WEFix: Intelligent Automatic Generation of Explicit Waits for Efficient Web End-to-End Flaky Tests [13.280540531582945]
We propose WEFix, a technique that can automatically generate fix code for UI-based flakiness in web e2e testing. We evaluate the effectiveness and efficiency of WEFix against 122 web e2e flaky tests from seven popular real-world projects.
arXiv Detail & Related papers (2024-02-15T06:51:53Z)
The Effects of Computational Resources on Flaky Tests [9.694460778355925]
Flaky tests are tests that nondeterministically pass and fail in unchanged code. Resource-Affected Flaky Tests indicate that a substantial proportion of flaky-test failures can be avoided by adjusting the resources available when running tests.
arXiv Detail & Related papers (2023-10-18T17:42:58Z)
Accelerating Continuous Integration with Parallel Batch Testing [0.0]
Continuous integration at scale is essential to software development. Various techniques including test selection and prioritization aim to reduce the cost. This study evaluates parallelization's effect by adjusting the number of test machines. We propose Dynamic TestCase, enabling new builds to join a batch before full test execution.
arXiv Detail & Related papers (2023-08-25T01:09:31Z)
Time-based Repair for Asynchronous Wait Flaky Tests in Web Testing [0.0]
Asynchronous waits are one of the most prevalent root causes of flaky tests in web applications. We propose TRaf, an automated time-based repair method for asynchronous wait flaky tests. Our analysis shows that TRaf can suggest a shorter wait time to resolve the test flakiness compared to developer-written fixes.
arXiv Detail & Related papers (2023-05-15T12:17:30Z)
Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures. We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption [69.76837484008033]
An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time. We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions. Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark.
arXiv Detail & Related papers (2021-03-30T09:33:38Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.