A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub
- URL: http://arxiv.org/abs/2502.02760v1
- Date: Tue, 04 Feb 2025 22:55:54 GMT
- Title: A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub
- Authors: Tom Schroeder, Minh Phan, Yang Chen,
- Abstract summary: We present our work-in-progress on studying flaky tests in Rust projects on GitHub.
We focus on flaky tests that are fixed, not just reported, as the fixes can offer valuable information on root causes, manifestation characteristics, and strategies of fixes.
- Score: 5.806051501952938
- License:
- Abstract: Prior research has extensively studied flaky tests in various domains, such as web applications, mobile applications, and other open-source projects in a range of multiple programing languages, including Java, Javascript, Python, Ruby, and more. However, little attention has been given to flaky tests in Rust -- an emerging popular language known for its safety features relative to C/C++. Rust incorporates interesting features that make it easy to detect some flaky tests, e.g., the Rust standard randomizes the order of elements in hash tables, effectively exposing implementation-dependent flakiness. However, Rust still has several sources of nondeterminism that can lead to flaky tests. We present our work-in-progress on studying flaky tests in Rust projects on GitHub. Searching through the closed Github issues and pull requests. We focus on flaky tests that are fixed, not just reported, as the fixes can offer valuable information on root causes, manifestation characteristics, and strategies of fixes. By far, we have inspected 53 tests. Our initial findings indicate that the predominant root causes include asynchronous wait (33.9%), concurrency issues (24.5%), logic errors (9.4%). and network-related problems (9.4%).
Related papers
- CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - Detecting and Evaluating Order-Dependent Flaky Tests in JavaScript [3.6513675781808357]
Flaky tests pose a significant issue for software testing.
Previous research has identified test order dependency as one of the most prevalent causes of flakiness.
This paper aims to investigate test order dependency in JavaScript tests.
arXiv Detail & Related papers (2025-01-22T06:52:11Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - A Study of Undefined Behavior Across Foreign Function Boundaries in Rust Libraries [2.359557447960552]
Rust is frequently used to interoperate with other languages.
Miri is the only dynamic analysis tool that can validate applications against these models.
Miri does not support finding bugs in foreign functions, indicating that there may be a critical correctness gap across the Rust ecosystem.
arXiv Detail & Related papers (2024-04-17T18:12:05Z) - JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content.
evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.
JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z) - Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution.
TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults.
Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z) - Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown.
We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times.
Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z) - Fixing Rust Compilation Errors using LLMs [2.1781086368581932]
The Rust programming language has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++.
This paper presents a tool called RustAssistant that leverages the emergent capabilities of Large Language Models (LLMs) to automatically suggest fixes for Rust compilation errors.
RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.
arXiv Detail & Related papers (2023-08-09T18:30:27Z) - torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need.
It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z) - AutoCoreset: An Automatic Practical Coreset Construction Framework [65.37876706107764]
A coreset is a tiny weighted subset of an input set, that closely resembles the loss function.
We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user.
We show that while this set is limited, the coreset is quite general.
arXiv Detail & Related papers (2023-05-19T19:59:52Z) - DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and
Code Skeletons [5.564793925574796]
We present an approach to automated debug using large, pretrained transformers.
We start by training a bug-creation model on reversed commit data for the purpose of generating synthetic bugs.
Next, we focus on 10K repositories for which we can execute tests, and create buggy versions of all functions that are covered by passing tests.
arXiv Detail & Related papers (2021-05-19T18:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.