Pass-Fail Criteria for Scenario-Based Testing of Automated Driving
Systems
- URL: http://arxiv.org/abs/2005.09417v2
- Date: Tue, 26 May 2020 14:15:12 GMT
- Title: Pass-Fail Criteria for Scenario-Based Testing of Automated Driving
Systems
- Authors: Robert Myers, Zeyn Saigol
- Abstract summary: This paper sets out a framework for assessing an automated driving system's behavioural safety in normal operation.
Risk-based rules cannot give a pass/fail decision from a single test case.
This considers statistical performance across many individual tests.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The MUSICC project has created a proof-of-concept scenario database to be
used as part of a type approval process for the verification of automated
driving systems (ADS). This process must include a highly automated means of
evaluating test results, as manual review at the scale required is impractical.
This paper sets out a framework for assessing an ADS's behavioural safety in
normal operation (i.e. performance of the dynamic driving task without
component failures or malicious actions). Five top-level evaluation criteria
for ADS performance are identified. Implementing these requires two types of
outcome scoring rule: prescriptive (measurable rules which must always be
followed) and risk-based (undesirable outcomes which must not occur too often).
Scoring rules are defined in a programming language and will be stored as part
of the scenario description.
Risk-based rules cannot give a pass/fail decision from a single test case.
Instead, a framework is defined to reach a decision for each functional
scenario (set of test cases with common features). This considers statistical
performance across many individual tests. Implications of this framework for
hypothesis testing and scenario selection are identified.
Related papers
- Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples.
Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z) - GOOSE: Goal-Conditioned Reinforcement Learning for Safety-Critical Scenario Generation [0.14999444543328289]
Goal-conditioned Scenario Generation (GOOSE) is a goal-conditioned reinforcement learning (RL) approach that automatically generates safety-critical scenarios.
We demonstrate the effectiveness of GOOSE in generating scenarios that lead to safety-critical events.
arXiv Detail & Related papers (2024-06-06T08:59:08Z) - Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections [57.67176250198289]
AWAIRE involves adaptively weighted averages of test statistics, essentially "learning" an effective set of hypotheses to test.
We explore schemes and settings more extensively, to identify and recommend efficient choices for practice.
A limitation of the current AWAIRE implementation is its restriction to a small number of candidates.
arXiv Detail & Related papers (2024-02-18T10:13:01Z) - Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity [8.97909097472183]
Testing and evaluating safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment.
The number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time.
We formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge.
arXiv Detail & Related papers (2024-02-02T04:47:14Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Conservative Estimation of Perception Relevance of Dynamic Objects for
Safe Trajectories in Automotive Scenarios [0.0]
The concept of relevance currently remains insufficiently defined and specified.
We propose a novel methodology to overcome this challenge by exemplary application to collision safety in the highway domain.
We present a conservative estimation which dynamic objects are relevant for perception and need to be considered for a complete evaluation.
arXiv Detail & Related papers (2023-07-20T13:43:48Z) - Tree-Based Scenario Classification: A Formal Framework for Coverage
Analysis on Test Drives of Autonomous Vehicles [0.0]
In scenario-based testing, relevant (driving) scenarios are the basis of tests.
We address the open challenges of classifying sets of scenarios and measuring coverage of theses scenarios in recorded test drives.
arXiv Detail & Related papers (2023-07-11T08:30:57Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Efficient statistical validation with edge cases to evaluate Highly
Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved.
Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements.
This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.