Related papers: How does Simulation-based Testing for Self-driving Cars match Human Perception?

How does Simulation-based Testing for Self-driving Cars match Human Perception?

URL: http://arxiv.org/abs/2401.14736v1
Date: Fri, 26 Jan 2024 09:58:12 GMT
Title: How does Simulation-based Testing for Self-driving Cars match Human Perception?
Authors: Christian Birchler, Tanzil Kombarabettu Mohammed, Pooja Rani, Teodora Nechita, Timo Kehrer, Sebastiano Panichella
Abstract summary: This study investigates the factors that determine how humans perceive self-driving cars test cases as safe, unsafe, realistic, or unrealistic. Our findings indicate that the human assessment of the safety and realism of failing and passing test cases can vary based on different factors.
Score: 5.742965094549775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software metrics such as coverage and mutation scores have been extensively explored for the automated quality assessment of test suites. While traditional tools rely on such quantifiable software metrics, the field of self-driving cars (SDCs) has primarily focused on simulation-based test case generation using quality metrics such as the out-of-bound (OOB) parameter to determine if a test case fails or passes. However, it remains unclear to what extent this quality metric aligns with the human perception of the safety and realism of SDCs, which are critical aspects in assessing SDC behavior. To address this gap, we conducted an empirical study involving 50 participants to investigate the factors that determine how humans perceive SDC test cases as safe, unsafe, realistic, or unrealistic. To this aim, we developed a framework leveraging virtual reality (VR) technologies, called SDC-Alabaster, to immerse the study participants into the virtual environment of SDC simulators. Our findings indicate that the human assessment of the safety and realism of failing and passing test cases can vary based on different factors, such as the test's complexity and the possibility of interacting with the SDC. Especially for the assessment of realism, the participants' age as a confounding factor leads to a different perception. This study highlights the need for more research on SDC simulation testing quality metrics and the importance of human perception in evaluating SDC behavior.

Related papers

Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems [6.065650382599096]
This paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM) for testing and evaluation of autonomous driving systems (ADSs) To make full use of testing information, this paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM) The experimental results show that ITEM could well identify the hazardous domains in both low- and high-dimensional cases, regardless of the shape of the hazardous domains.
arXiv Detail & Related papers (2025-01-21T06:59:25Z)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z)
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation [48.54271457765236]
Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values. Current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. We propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments.
arXiv Detail & Related papers (2024-05-23T02:57:42Z)
Diversity-guided Search Exploration for Self-driving Cars Test Generation through Frenet Space Encoding [4.135985106933988]
The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments. While field testing is essential, current methods lack diversity in assessing critical SDC scenarios. We show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model.
arXiv Detail & Related papers (2024-01-26T06:57:00Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
Identifying and Explaining Safety-critical Scenarios for Autonomous Vehicles via Key Features [5.634825161148484]
This paper uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs. ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in 2D. To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe.
arXiv Detail & Related papers (2022-12-15T00:52:47Z)
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally. We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z)
Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems [6.649715954440713]
We leverage the Donkey Car open-source framework to empirically compare testing of SDCs when deployed on a physical small-scale vehicle vs its virtual simulated counterpart. While a large number of testing results do transfer between virtual and physical environments, we also identified critical shortcomings that contribute to the reality gap between the virtual and physical world.
arXiv Detail & Related papers (2021-12-21T14:28:35Z)
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles [86.9067793493874]
We propose efficient mechanisms to characterize and generate testing scenarios using a state-of-the-art driving simulator. We use our method to characterize real driving data from the Next Generation Simulation (NGSIM) project. We rank the scenarios by defining metrics based on the complexity of avoiding accidents and provide insights into how the AV could have minimized the probability of incurring an accident.
arXiv Detail & Related papers (2021-03-12T17:00:23Z)
A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world. We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms. Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z)
Efficient statistical validation with edge cases to evaluate Highly Automated Vehicles [6.198523595657983]
The widescale deployment of Autonomous Vehicles seems to be imminent despite many safety challenges that are yet to be resolved. Existing standards focus on deterministic processes where the validation requires only a set of test cases that cover the requirements. This paper presents a new approach to compute the statistical characteristics of a system's behaviour by biasing automatically generated test cases towards the worst case scenarios.
arXiv Detail & Related papers (2020-03-04T04:35:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.