Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving
Systems
- URL: http://arxiv.org/abs/2311.18768v1
- Date: Thu, 30 Nov 2023 18:08:02 GMT
- Title: Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving
Systems
- Authors: Mohammad Hossein Amini, Shervin Naseri, Shiva Nejati
- Abstract summary: We investigate test flakiness in simulation-based testing of Autonomous Driving Systems (ADS)
We show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms.
Our machine learning (ML) classifiers effectively identify flaky ADS tests using only a single test run.
- Score: 2.291478393584594
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Simulators are widely used to test Autonomous Driving Systems (ADS), but
their potential flakiness can lead to inconsistent test results. We investigate
test flakiness in simulation-based testing of ADS by addressing two key
questions: (1) How do flaky ADS simulations impact automated testing that
relies on randomized algorithms? and (2) Can machine learning (ML) effectively
identify flaky ADS tests while decreasing the required number of test reruns?
Our empirical results, obtained from two widely-used open-source ADS simulators
and five diverse ADS test setups, show that test flakiness in ADS is a common
occurrence and can significantly impact the test results obtained by randomized
algorithms. Further, our ML classifiers effectively identify flaky ADS tests
using only a single test run, achieving F1-scores of $85$%, $82$% and $96$% for
three different ADS test setups. Our classifiers significantly outperform our
non-ML baseline, which requires executing tests at least twice, by $31$%,
$21$%, and $13$% in F1-score performance, respectively. We conclude with a
discussion on the scope, implications and limitations of our study. We provide
our complete replication package in a Github repository.
Related papers
- Fine-grained Testing for Autonomous Driving Software: a Study on Autoware with LLM-driven Unit Testing [12.067489008051208]
We present the first study on testing, specifically unit testing, for autonomous driving systems (ADS) source code.
We analyze both human-written test cases and those generated by large language models (LLMs)
We propose AwTest-LLM, a novel approach to enhance test coverage and improve test case pass rates across Autoware packages.
arXiv Detail & Related papers (2025-01-16T22:36:00Z) - DriveTester: A Unified Platform for Simulation-Based Autonomous Driving Testing [24.222344794923558]
DriveTester is a unified simulation-based testing platform built on Apollo.
It provides a consistent and reliable environment, integrates a lightweight traffic simulator, and incorporates various state-of-the-art ADS testing techniques.
arXiv Detail & Related papers (2024-12-17T08:24:05Z) - Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code.
We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z) - LLM-Powered Test Case Generation for Detecting Tricky Bugs [30.82169191775785]
AID generates test inputs and oracles targeting plausibly correct programs.
We evaluate AID on two large-scale datasets with tricky bugs: TrickyBugs and EvalPlus.
The evaluation results show that the recall, precision, and F1 score of AID outperform the state-of-the-art by up to 1.80x, 2.65x, and 1.66x, respectively.
arXiv Detail & Related papers (2024-04-16T06:20:06Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures.
We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z) - AutoML Two-Sample Test [13.468660785510945]
We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power.
We provide an implementation of the AutoML two-sample test in the Python package autotst.
arXiv Detail & Related papers (2022-06-17T15:41:07Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Digital Twins Are Not Monozygotic -- Cross-Replicating ADAS Testing in
Two Industry-Grade Automotive Simulators [13.386879259549305]
We show that SBST can be used to effectively and efficiently generate critical test scenarios in two simulators.
We find that executing the same test scenarios in the two simulators leads to notable differences in the details of the test outputs.
arXiv Detail & Related papers (2020-12-12T14:00:33Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.