Related papers: Pass-Fail Criteria for Scenario-Based Testing of Automated Driving Systems

Pass-Fail Criteria for Scenario-Based Testing of Automated Driving Systems

URL: http://arxiv.org/abs/2005.09417v2
Date: Tue, 26 May 2020 14:15:12 GMT
Title: Pass-Fail Criteria for Scenario-Based Testing of Automated Driving Systems
Authors: Robert Myers, Zeyn Saigol
Abstract summary: This paper sets out a framework for assessing an automated driving system's behavioural safety in normal operation. Risk-based rules cannot give a pass/fail decision from a single test case. This considers statistical performance across many individual tests.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The MUSICC project has created a proof-of-concept scenario database to be used as part of a type approval process for the verification of automated driving systems (ADS). This process must include a highly automated means of evaluating test results, as manual review at the scale required is impractical. This paper sets out a framework for assessing an ADS's behavioural safety in normal operation (i.e. performance of the dynamic driving task without component failures or malicious actions). Five top-level evaluation criteria for ADS performance are identified. Implementing these requires two types of outcome scoring rule: prescriptive (measurable rules which must always be followed) and risk-based (undesirable outcomes which must not occur too often). Scoring rules are defined in a programming language and will be stored as part of the scenario description. Risk-based rules cannot give a pass/fail decision from a single test case. Instead, a framework is defined to reach a decision for each functional scenario (set of test cases with common features). This considers statistical performance across many individual tests. Implications of this framework for hypothesis testing and scenario selection are identified.

Related papers

SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU) We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level. Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z)
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection. Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains. We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z)
Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples. Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z)
GOOSE: Goal-Conditioned Reinforcement Learning for Safety-Critical Scenario Generation [0.14999444543328289]
Goal-conditioned Scenario Generation (GOOSE) is a goal-conditioned reinforcement learning (RL) approach that automatically generates safety-critical scenarios. We demonstrate the effectiveness of GOOSE in generating scenarios that lead to safety-critical events.
arXiv Detail & Related papers (2024-06-06T08:59:08Z)
Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections [57.67176250198289]
AWAIRE involves adaptively weighted averages of test statistics, essentially "learning" an effective set of hypotheses to test. We explore schemes and settings more extensively, to identify and recommend efficient choices for practice. A limitation of the current AWAIRE implementation is its restriction to a small number of candidates.
arXiv Detail & Related papers (2024-02-18T10:13:01Z)
Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity [8.97909097472183]
Testing and evaluating safety performance of autonomous vehicles (AVs) is essential before the large-scale deployment. The number of testing scenarios permissible for a specific AV is severely limited by tight constraints on testing budgets and time. We formulate this problem for the first time the "few-shot testing" (FST) problem and propose a systematic framework to address this challenge.
arXiv Detail & Related papers (2024-02-02T04:47:14Z)
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP) We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z)
Conservative Estimation of Perception Relevance of Dynamic Objects for Safe Trajectories in Automotive Scenarios [0.0]
The concept of relevance currently remains insufficiently defined and specified. We propose a novel methodology to overcome this challenge by exemplary application to collision safety in the highway domain. We present a conservative estimation which dynamic objects are relevant for perception and need to be considered for a complete evaluation.
arXiv Detail & Related papers (2023-07-20T13:43:48Z)
Tree-Based Scenario Classification: A Formal Framework for Coverage Analysis on Test Drives of Autonomous Vehicles [0.0]
In scenario-based testing, relevant (driving) scenarios are the basis of tests. We address the open challenges of classifying sets of scenarios and measuring coverage of theses scenarios in recorded test drives.
arXiv Detail & Related papers (2023-07-11T08:30:57Z)
Robust Continual Test-time Adaptation: Instance-aware BN and Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams. Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z)
Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores. We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss. Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
Efficient statistical validation with edge cases to evaluate Highly Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.