Related papers: JS-TOD: Detecting Order-Dependent Flaky Tests in Jest

JS-TOD: Detecting Order-Dependent Flaky Tests in Jest

URL: http://arxiv.org/abs/2509.00466v1
Date: Sat, 30 Aug 2025 11:44:14 GMT
Title: JS-TOD: Detecting Order-Dependent Flaky Tests in Jest
Authors: Negar Hashemi, Amjed Tahir, Shawn Rasheed, August Shi, Rachel Blagojevic,
Abstract summary: JS-TOD is a tool that can extract, reorder, and rerun Jest tests to reveal possible order-dependent test flakiness.<n>Test order dependency is one of the leading causes of test flakiness.
Score: 5.178246622041266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present JS-TOD (JavaScript Test Order-dependency Detector), a tool that can extract, reorder, and rerun Jest tests to reveal possible order-dependent test flakiness. Test order dependency is one of the leading causes of test flakiness. Ideally, each test should operate in isolation and yield consistent results no matter the sequence in which tests are run. However, in practice, test outcomes can vary depending on their execution order. JS-TOD employed a systematic approach to randomising tests, test suites, and describe blocks. The tool is highly customisable, as one can set the number of orders and reruns required (the default setting is 10 reorder and 10 reruns for each test and test suite). Our evaluation using JS-TOD reveals two main causes of test order dependency flakiness: shared files and shared mocking state between tests.

Related papers

Test Behaviors, Not Methods! Detecting Tests Obsessed by Methods [3.6417668958891785]
Tests that verify multiple behaviors are harder to understand, lack focus, and are more coupled to the production code.<n>We propose a novel test smell named emphTest Obsessed by Method, a test method that covers multiple paths of a single production method.
arXiv Detail & Related papers (2026-01-31T14:58:39Z)
Reduction of Test Re-runs by Prioritizing Potential Order Dependent Flaky Tests [0.5798758080057375]
Flaky tests can make automated software testing unreliable due to their unpredictable behavior.<n>A common type of flaky test is the order-dependent (OD) test.<n>We propose a method to prioritize potential OD tests.
arXiv Detail & Related papers (2025-10-30T06:17:30Z)
E-Test: E'er-Improving Test Suites [8.585182075116336]
E-Test identifies executions that have not yet been tested from large sets of scenarios.<n>It generates new test cases that enhance the test suite.<n>E-Test retrieves not-yet-tested execution scenarios significantly better than state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-21T21:23:33Z)
Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure.<n>We investigated 207 versions of 6 open-source projects.<n>Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z)
Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript [3.6513675781808357]
Flaky tests pose a significant issue for software testing.<n>Previous research has identified test order dependency as one of the most prevalent causes of flakiness.<n>This paper aims to investigate test order dependency in JavaScript tests.
arXiv Detail & Related papers (2025-01-22T06:52:11Z)
LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom Large Language Models (LLMs) to generate realistic test inputs.<n>We evaluate it against several state-of-the-art REST API testing tools, including RESTGPT, a GPT-powered specification-enhancement tool.<n>Our study shows that small language models can perform as well as, or better than, large language models in REST API testing.
arXiv Detail & Related papers (2025-01-15T05:51:20Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.<n>Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.<n> Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code. We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z)
WEFix: Intelligent Automatic Generation of Explicit Waits for Efficient Web End-to-End Flaky Tests [13.280540531582945]
We propose WEFix, a technique that can automatically generate fix code for UI-based flakiness in web e2e testing. We evaluate the effectiveness and efficiency of WEFix against 122 web e2e flaky tests from seven popular real-world projects.
arXiv Detail & Related papers (2024-02-15T06:51:53Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults. Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z)
Taming Timeout Flakiness: An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes. Test timeouts are one contributing factor to such flaky test failures. Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
arXiv Detail & Related papers (2024-02-07T20:01:41Z)
TestSpark: IntelliJ IDEA's Ultimate Test Generation Companion [15.13443954421825]
This paper introduces TestSpark, a plugin for IntelliJ IDEA that enables users to generate unit tests with only a few clicks. TestSpark also allows users to easily modify and run each generated test and integrate them into the project workflow.
arXiv Detail & Related papers (2024-01-12T13:53:57Z)
Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown. We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times. Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z)
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation [3.9762912548964864]
This paper presents a large-scale empirical evaluation on the effectiveness of Large Language Models for automated unit test generation. We implement our approach in TestPilot, a test generation tool for JavaScript that automatically generates unit tests for all API functions in an npm package. We find that 92.8% of TestPilot's generated tests have no more than 50% similarity with existing tests.
arXiv Detail & Related papers (2023-02-13T17:13:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.