Related papers: A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub

A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub

URL: http://arxiv.org/abs/2502.02760v1
Date: Tue, 04 Feb 2025 22:55:54 GMT
Title: A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub
Authors: Tom Schroeder, Minh Phan, Yang Chen,
Abstract summary: We present our work-in-progress on studying flaky tests in Rust projects on GitHub.<n>We focus on flaky tests that are fixed, not just reported, as the fixes can offer valuable information on root causes, manifestation characteristics, and strategies of fixes.
Score: 5.806051501952938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior research has extensively studied flaky tests in various domains, such as web applications, mobile applications, and other open-source projects in a range of multiple programing languages, including Java, Javascript, Python, Ruby, and more. However, little attention has been given to flaky tests in Rust -- an emerging popular language known for its safety features relative to C/C++. Rust incorporates interesting features that make it easy to detect some flaky tests, e.g., the Rust standard randomizes the order of elements in hash tables, effectively exposing implementation-dependent flakiness. However, Rust still has several sources of nondeterminism that can lead to flaky tests. We present our work-in-progress on studying flaky tests in Rust projects on GitHub. Searching through the closed Github issues and pull requests. We focus on flaky tests that are fixed, not just reported, as the fixes can offer valuable information on root causes, manifestation characteristics, and strategies of fixes. By far, we have inspected 53 tests. Our initial findings indicate that the predominant root causes include asynchronous wait (33.9%), concurrency issues (24.5%), logic errors (9.4%). and network-related problems (9.4%).

Related papers

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
A Study of Undefined Behavior Across Foreign Function Boundaries in Rust Libraries [2.359557447960552]
Rust is frequently used to interoperate with other languages.<n>Miri is the only dynamic analysis tool that can validate applications against these models.<n>Miri does not support foreign functions, indicating that there may be a critical correctness gap across the Rust ecosystem.
arXiv Detail & Related papers (2024-04-17T18:12:05Z)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults. Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z)
Taming Timeout Flakiness: An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes. Test timeouts are one contributing factor to such flaky test failures. Test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions.
arXiv Detail & Related papers (2024-02-07T20:01:41Z)
Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown. We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times. Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z)
Fixing Rust Compilation Errors using LLMs [2.1781086368581932]
The Rust programming language has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++. This paper presents a tool called RustAssistant that leverages the emergent capabilities of Large Language Models (LLMs) to automatically suggest fixes for Rust compilation errors. RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.
arXiv Detail & Related papers (2023-08-09T18:30:27Z)
AutoCoreset: An Automatic Practical Coreset Construction Framework [65.37876706107764]
A coreset is a tiny weighted subset of an input set, that closely resembles the loss function. We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user. We show that while this set is limited, the coreset is quite general.
arXiv Detail & Related papers (2023-05-19T19:59:52Z)
DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons [5.564793925574796]
We present an approach to automated debug using large, pretrained transformers. We start by training a bug-creation model on reversed commit data for the purpose of generating synthetic bugs. Next, we focus on 10K repositories for which we can execute tests, and create buggy versions of all functions that are covered by passing tests.
arXiv Detail & Related papers (2021-05-19T18:40:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.