Related papers: Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript

Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript

URL: http://arxiv.org/abs/2501.12680v1
Date: Wed, 22 Jan 2025 06:52:11 GMT
Title: Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript
Authors: Negar Hashemi, Amjed Tahir, Shawn Rasheed, August Shi, Rachel Blagojevic,
Abstract summary: Flaky tests pose a significant issue for software testing.<n>Previous research has identified test order dependency as one of the most prevalent causes of flakiness.<n>This paper aims to investigate test order dependency in JavaScript tests.
Score: 3.6513675781808357
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Flaky tests pose a significant issue for software testing. A test with a non-deterministic outcome may undermine the reliability of the testing process, making tests untrustworthy. Previous research has identified test order dependency as one of the most prevalent causes of flakiness, particularly in Java and Python. However, little is known about test order dependency in JavaScript tests. This paper aims to investigate test order dependency in JavaScript projects that use Jest, a widely used JavaScript testing framework. We implemented a systematic approach to randomise tests, test suites and describe blocks and produced 10 unique test reorders for each level. We reran each order 10 times (100 reruns for each test suite/project) and recorded any changes in test outcomes. We then manually analysed each case that showed flaky outcomes to determine the cause of flakiness. We examined our detection approach on a dataset of 81 projects obtained from GitHub. Our results revealed 55 order-dependent tests across 10 projects. Most order-dependent tests (52) occurred between tests, while the remaining three occurred between describe blocks. Those order-dependent tests are caused by either shared files (13) or shared mocking state (42) between tests. While sharing files is a known cause of order-dependent tests in other languages, our results underline a new cause (shared mocking state) that was not reported previously

Related papers

Intention-Driven Generation of Project-Specific Test Cases [14.297390481640068]
We propose IntentionTest, which generates project-specific tests with validation intention as a structured description.<n>We evaluate IntentionTest against state-of-the-art baselines on 4,146 test cases from 13 open-source projects.
arXiv Detail & Related papers (2025-07-28T08:35:04Z)
Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure. We investigated 207 versions of 6 open-source projects. Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z)
Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces [3.7752830020595796]
We curate a dataset (Codehacks) of programming problems together with corresponding error-inducing test cases. The dataset comprises 288,617 hacks for 5,578 programming problems. The source code for 2,196 submitted solutions to these problems can be broken with their corresponding hacks.
arXiv Detail & Related papers (2025-03-30T14:50:03Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem. A test built on a simple string kernel achieves a median of 77.4% power against a range of distortions. We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.
arXiv Detail & Related papers (2024-10-26T18:34:53Z)
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories. It covers initial tests authoring, test suite completion, and code coverage improvements. We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z)
Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code. We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults. Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z)
Taming Timeout Flakiness: An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes. Test timeouts are one contributing factor to such flaky test failures. Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
arXiv Detail & Related papers (2024-02-07T20:01:41Z)
Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown. We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times. Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z)
FlaPy: Mining Flaky Python Tests at Scale [14.609208863749831]
FlaPy is a framework for researchers to mine flaky tests in a given or automatically sampled set of Python projects by rerunning their test suites. FlaPy isolates the test executions using containerization and fresh execution environments to simulate real-world CI conditions. FlaPy supports parallelizing the test executions using SLURM, making it feasible to scan thousands of projects for test flakiness.
arXiv Detail & Related papers (2023-05-08T15:48:57Z)
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation [3.9762912548964864]
This paper presents a large-scale empirical evaluation on the effectiveness of Large Language Models for automated unit test generation. We implement our approach in TestPilot, a test generation tool for JavaScript that automatically generates unit tests for all API functions in an npm package. We find that 92.8% of TestPilot's generated tests have no more than 50% similarity with existing tests.
arXiv Detail & Related papers (2023-02-13T17:13:41Z)
Automated Support for Unit Test Generation: A Tutorial Book Chapter [21.716667622896193]
Unit testing is a stage of testing where the smallest segment of code that can be tested in isolation from the rest of the system is tested. Unit tests are typically written as executable code, often in a format provided by a unit testing framework such as pytest for Python. This chapter introduces the concept of search-based unit test generation.
arXiv Detail & Related papers (2021-10-26T11:13:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.