Related papers: A Systematic Evaluation of Environmental Flakiness in JavaScript Tests

A Systematic Evaluation of Environmental Flakiness in JavaScript Tests

URL: http://arxiv.org/abs/2602.19098v1
Date: Sun, 22 Feb 2026 08:59:27 GMT
Title: A Systematic Evaluation of Environmental Flakiness in JavaScript Tests
Authors: Negar Hashemi, Amjed Tahir, August Shi, Shawn Rasheed, Rachel Blagojevic,
Abstract summary: Test flakiness is a significant issue in industry, affecting test efficiency and product quality.<n>We conduct a systematic evaluation of the impact of environmental factors on test flakiness in JavaScript.<n>We develop a lightweight mitigation approach, js-env-sanitizer, that can sanitize environmental-related flaky tests.
Score: 5.178246622041267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test flakiness is a significant issue in industry, affecting test efficiency and product quality. While extensive research has examined the impact of flaky tests, many root causes remain unexplored, particularly in the context of dynamic languages such as JavaScript. In this paper, we conduct a systematic evaluation of the impact of environmental factors on test flakiness in JavaScript. We first executed test suites across multiple environmental configurations to determine whether changes in the environment could lead to flaky behavior. We selected three environmental factors to manipulate: the operating system, the Node.js version, and the browser. We identified a total of 65 environmental flaky projects, with 28 related to operating system issues, five to Node.js version compatibility, 16 to a combination of operating system and Node.js issues, and 17 related to browser compatibility. To address environmental flakiness, we developed a lightweight mitigation approach, js-env-sanitizer, that can sanitize environmental-related flaky tests by skipping and reporting them (rather than failing), allowing CI builds to continue/succeed without rerunning entire test suites. The tool achieves high accuracy with minimal performance or configuration overhead, and currently supports three popular JavaScript testing frameworks (Jest, Mocha, and Vitest)

Related papers

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.<n>Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.<n>To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.<n>The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub [5.806051501952938]
We present our work-in-progress on studying flaky tests in Rust projects on GitHub.<n>We focus on flaky tests that are fixed, not just reported, as the fixes can offer valuable information on root causes, manifestation characteristics, and strategies of fixes.
arXiv Detail & Related papers (2025-02-04T22:55:54Z)
Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript [3.6513675781808357]
Flaky tests pose a significant issue for software testing.<n>Previous research has identified test order dependency as one of the most prevalent causes of flakiness.<n>This paper aims to investigate test order dependency in JavaScript tests.
arXiv Detail & Related papers (2025-01-22T06:52:11Z)
The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents.<n>We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature.<n>We conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks.
arXiv Detail & Related papers (2024-12-06T23:43:59Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.<n>Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.<n> Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
GHunter: Universal Prototype Pollution Gadgets in JavaScript Runtimes [5.852467142337343]
Prototype pollution is a recent vulnerability that affects JavaScript code. It is rooted in JavaScript's prototype-based inheritance, enabling attackers to inject arbitrary properties into an object's prototype at runtime. We study gadgets in V8-based JavaScript runtimes with prime focus on Node.js and Deno.
arXiv Detail & Related papers (2024-07-15T15:30:00Z)
FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code. It selectively enforces code execution on APIs that conditionally inject dynamic code. It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z)
WEFix: Intelligent Automatic Generation of Explicit Waits for Efficient Web End-to-End Flaky Tests [13.280540531582945]
We propose WEFix, a technique that can automatically generate fix code for UI-based flakiness in web e2e testing. We evaluate the effectiveness and efficiency of WEFix against 122 web e2e flaky tests from seven popular real-world projects.
arXiv Detail & Related papers (2024-02-15T06:51:53Z)
Taming Timeout Flakiness: An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes. Test timeouts are one contributing factor to such flaky test failures. Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
arXiv Detail & Related papers (2024-02-07T20:01:41Z)
Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown. We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times. Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.