Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript
- URL: http://arxiv.org/abs/2501.12680v1
- Date: Wed, 22 Jan 2025 06:52:11 GMT
- Title: Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript
- Authors: Negar Hashemi, Amjed Tahir, Shawn Rasheed, August Shi, Rachel Blagojevic,
- Abstract summary: Flaky tests pose a significant issue for software testing.
Previous research has identified test order dependency as one of the most prevalent causes of flakiness.
This paper aims to investigate test order dependency in JavaScript tests.
- Score: 3.6513675781808357
- License:
- Abstract: Flaky tests pose a significant issue for software testing. A test with a non-deterministic outcome may undermine the reliability of the testing process, making tests untrustworthy. Previous research has identified test order dependency as one of the most prevalent causes of flakiness, particularly in Java and Python. However, little is known about test order dependency in JavaScript tests. This paper aims to investigate test order dependency in JavaScript projects that use Jest, a widely used JavaScript testing framework. We implemented a systematic approach to randomise tests, test suites and describe blocks and produced 10 unique test reorders for each level. We reran each order 10 times (100 reruns for each test suite/project) and recorded any changes in test outcomes. We then manually analysed each case that showed flaky outcomes to determine the cause of flakiness. We examined our detection approach on a dataset of 81 projects obtained from GitHub. Our results revealed 55 order-dependent tests across 10 projects. Most order-dependent tests (52) occurred between tests, while the remaining three occurred between describe blocks. Those order-dependent tests are caused by either shared files (13) or shared mocking state (42) between tests. While sharing files is a known cause of order-dependent tests in other languages, our results underline a new cause (shared mocking state) that was not reported previously
Related papers
- CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem.
A test built on a simple string kernel achieves a median of 77.4% power against a range of distortions.
We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.
arXiv Detail & Related papers (2024-10-26T18:34:53Z) - TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.
It covers initial tests authoring, test suite completion, and code coverage improvements.
We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code.
We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z) - Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution.
TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults.
Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z) - Taming Timeout Flakiness: An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes.
Test timeouts are one contributing factor to such flaky test failures.
Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
arXiv Detail & Related papers (2024-02-07T20:01:41Z) - Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown.
We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times.
Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z) - FlaPy: Mining Flaky Python Tests at Scale [14.609208863749831]
FlaPy is a framework for researchers to mine flaky tests in a given or automatically sampled set of Python projects by rerunning their test suites.
FlaPy isolates the test executions using containerization and fresh execution environments to simulate real-world CI conditions.
FlaPy supports parallelizing the test executions using SLURM, making it feasible to scan thousands of projects for test flakiness.
arXiv Detail & Related papers (2023-05-08T15:48:57Z) - Validation of massively-parallel adaptive testing using dynamic control
matching [0.0]
Modern businesses often run many A/B/n tests at the same time and in parallel, and package many content variations into the same messages.
This paper presents a method for disentangling the causal effects of the various tests under conditions of continuous test adaptation.
arXiv Detail & Related papers (2023-05-02T11:28:12Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - Automated Support for Unit Test Generation: A Tutorial Book Chapter [21.716667622896193]
Unit testing is a stage of testing where the smallest segment of code that can be tested in isolation from the rest of the system is tested.
Unit tests are typically written as executable code, often in a format provided by a unit testing framework such as pytest for Python.
This chapter introduces the concept of search-based unit test generation.
arXiv Detail & Related papers (2021-10-26T11:13:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.