Simulator Ensembles for Trustworthy Autonomous Driving Testing
- URL: http://arxiv.org/abs/2503.08936v1
- Date: Tue, 11 Mar 2025 22:34:14 GMT
- Title: Simulator Ensembles for Trustworthy Autonomous Driving Testing
- Authors: Lev Sorokin, Matteo Biagiola, Andrea Stocco,
- Abstract summary: MultiSim is a novel approach to multi-simulation ADAS testing based on a search-based testing approach.<n>It identifies 54% more simulator-agnostic failing tests while showing a comparable validity rate.
- Score: 2.1779479916071067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scenario-based testing with driving simulators is extensively used to identify failing conditions of automated driving assistance systems (ADAS) and reduce the amount of in-field road testing. However, existing studies have shown that repeated test execution in the same as well as in distinct simulators can yield different outcomes, which can be attributed to sources of flakiness or different implementations of the physics, among other factors. In this paper, we present MultiSim, a novel approach to multi-simulation ADAS testing based on a search-based testing approach that leverages an ensemble of simulators to identify failure-inducing, simulator-agnostic test scenarios. During the search, each scenario is evaluated jointly on multiple simulators. Scenarios that produce consistent results across simulators are prioritized for further exploration, while those that fail on only a subset of simulators are given less priority, as they may reflect simulator-specific issues rather than generalizable failures. Our case study, which involves testing a deep neural network-based ADAS on different pairs of three widely used simulators, demonstrates that MultiSim outperforms single-simulator testing by achieving on average a higher rate of simulator-agnostic failures by 51%. Compared to a state-of-the-art multi-simulator approach that combines the outcome of independent test generation campaigns obtained in different simulators, MultiSim identifies 54% more simulator-agnostic failing tests while showing a comparable validity rate. An enhancement of MultiSim that leverages surrogate models to predict simulator disagreements and bypass executions does not only increase the average number of valid failures but also improves efficiency in finding the first valid failure.
Related papers
- LLM-Agents Driven Automated Simulation Testing and Analysis of small Uncrewed Aerial Systems [11.183147511573717]
Thorough simulation testing is crucial for validating the correct behavior of small Uncrewed Aerial Systems.<n>Various sUAS simulation tools exist to support developers, but the entire process of creating, executing, and analyzing simulation tests remains a largely manual and cumbersome task.<n>We propose AutoSimTest, a framework where multiple LLM agents collaborate to support the sUAS simulation testing process.
arXiv Detail & Related papers (2025-01-21T03:42:21Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Two is Better Than One: Digital Siblings to Improve Autonomous Driving Testing [10.518360486008964]
We introduce the notion of digital siblings, a multi-simulator approach that tests a given autonomous vehicle on multiple general-purpose simulators.
We empirically compare such a multi-simulator approach against a digital twin of a physical scaled autonomous vehicle on a large set of test cases.
Our empirical evaluation shows that the ensemble failure predictor by the digital siblings is superior to each individual simulator at predicting the failures of the digital twin.
arXiv Detail & Related papers (2023-05-14T04:10:56Z) - Construction of a Surrogate Model: Multivariate Time Series Prediction
with a Hybrid Model [2.198430261120653]
Automotive groups rely on simulators to perform most tests.
The reliability of these simulators for constantly refined tasks is becoming an issue.
To increase the number of tests, the industry is now developing surrogate models.
arXiv Detail & Related papers (2022-12-15T15:52:18Z) - Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z) - Metaphorical User Simulators for Evaluating Task-oriented Dialogue
Systems [80.77917437785773]
Task-oriented dialogue systems ( TDSs) are assessed mainly in an offline setting or through human evaluation.
We propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems.
We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities.
arXiv Detail & Related papers (2022-04-02T05:11:03Z) - Finding Failures in High-Fidelity Simulation using Adaptive Stress
Testing and the Backward Algorithm [35.076062292062325]
Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system.
AST with a deep reinforcement learning solver has been shown to be effective in finding failures across a range of different systems.
To improve efficiency, we present a method that first finds failures in a low-fidelity simulator.
It then uses the backward algorithm, which trains a deep neural network policy using a single expert demonstration, to adapt the low-fidelity failures to high-fidelity.
arXiv Detail & Related papers (2021-07-27T16:54:04Z) - Generating and Characterizing Scenarios for Safety Testing of Autonomous
Vehicles [86.9067793493874]
We propose efficient mechanisms to characterize and generate testing scenarios using a state-of-the-art driving simulator.
We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project.
We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident.
arXiv Detail & Related papers (2021-03-12T17:00:23Z) - Digital Twins Are Not Monozygotic -- Cross-Replicating ADAS Testing in
Two Industry-Grade Automotive Simulators [13.386879259549305]
We show that SBST can be used to effectively and efficiently generate critical test scenarios in two simulators.
We find that executing the same test scenarios in the two simulators leads to notable differences in the details of the test outputs.
arXiv Detail & Related papers (2020-12-12T14:00:33Z) - Testing the Safety of Self-driving Vehicles by Simulating Perception and
Prediction [88.0416857308144]
We propose an alternative to sensor simulation, as sensor simulation is expensive and has large domain gaps.
We directly simulate the outputs of the self-driving vehicle's perception and prediction system, enabling realistic motion planning testing.
arXiv Detail & Related papers (2020-08-13T17:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.