Related papers: Testing Autonomous Systems with Believed Equivalence Refinement

Testing Autonomous Systems with Believed Equivalence Refinement

URL: http://arxiv.org/abs/2103.04578v1
Date: Mon, 8 Mar 2021 07:25:20 GMT
Title: Testing Autonomous Systems with Believed Equivalence Refinement
Authors: Chih-Hong Cheng, Rongjie Yan
Abstract summary: We propose believed equivalence, where the establishment of an equivalence class is initially based on expert belief. We focus on modules implemented using deep neural networks where every category partitions an input over the real domain.
Score: 1.370633147306388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous engineering of autonomous driving functions commonly requires deploying vehicles in road testing to obtain inputs that cause problematic decisions. Although the discovery leads to producing an improved system, it also challenges the foundation of testing using equivalence classes and the associated relative test coverage criterion. In this paper, we propose believed equivalence, where the establishment of an equivalence class is initially based on expert belief and is subject to a set of available test cases having a consistent valuation. Upon a newly encountered test case that breaks the consistency, one may need to refine the established categorization in order to split the originally believed equivalence into two. Finally, we focus on modules implemented using deep neural networks where every category partitions an input over the real domain. We establish new equivalence classes by guiding the new test cases following directions suggested by its k-nearest neighbors, complemented by local robustness testing. The concept is demonstrated in a lane-keeping assist module indicating the potential of our proposed approach.

Related papers

Methodology for Test Case Allocation based on a Formalized ODD [0.4349640169711269]
This paper presents a method for evaluating the suitability of test case allocation to various test environments by drawing on and extending an existing Operational Design Domain (ODD) formalization.<n>The resulting construct integrates ODD parameters and additional test attributes to capture a given test environments relevant capabilities.
arXiv Detail & Related papers (2025-09-02T13:33:24Z)
Kernel conditional tests from learning-theoretic bounds [16.813275168865953]
We propose a framework for hypothesis testing on conditional probability distributions.<n>We then use to construct statistical tests of functionals of conditional distributions.<n>Our results establish a comprehensive foundation for conditional testing on functionals.
arXiv Detail & Related papers (2025-06-04T12:53:13Z)
Network Inversion for Uncertainty-Aware Out-of-Distribution Detection [2.6733991338938026]
Out-of-distribution (OOD) detection and uncertainty estimation are critical components for building safe machine learning systems.<n>We propose a novel framework that combines network inversion with classifier training to address both OOD detection and uncertainty estimation.<n>Our approach is scalable, interpretable, and does not require access to external OOD datasets or post-hoc calibration techniques.
arXiv Detail & Related papers (2025-05-29T13:53:52Z)
Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems [6.065650382599096]
This paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM) for testing and evaluation of autonomous driving systems (ADSs) To make full use of testing information, this paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM) The experimental results show that ITEM could well identify the hazardous domains in both low- and high-dimensional cases, regardless of the shape of the hazardous domains.
arXiv Detail & Related papers (2025-01-21T06:59:25Z)
UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation [66.05528698010697]
Test-Time Adaptation aims to adapt pre-trained models to the target domain during testing. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges. We propose a Unified Test-Time Adaptation benchmark, which is comprehensive and widely applicable.
arXiv Detail & Related papers (2024-07-29T15:04:53Z)
Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems. We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z)
Test Case Recommendations with Distributed Representation of Code Syntactic Features [2.225268436173329]
We propose an automated approach which exploits both structural and semantic properties of source code methods and test cases. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations. The model computes cosine similarity between the method's embedding and the previously-embedded training instances.
arXiv Detail & Related papers (2023-10-04T21:42:01Z)
Large Class Separation is not what you need for Relational Reasoning-based OOD Detection [12.578844450586]
Out-Of-Distribution (OOD) detection methods provide a solution by identifying semantic novelty. Most of these methods leverage a learning stage on the known data, which means training (or fine-tuning) a model to capture the concept of normality. A viable alternative is that of evaluating similarities in the embedding space produced by large pre-trained models without any further learning effort.
arXiv Detail & Related papers (2023-07-12T14:10:15Z)
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions. Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z)
Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory. A pool selection strategy is presented to build a solid ensemble classifier. We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z)
Learning Classifiers of Prototypes and Reciprocal Points for Universal Domain Adaptation [79.62038105814658]
Universal Domain aims to transfer the knowledge between datasets by handling two shifts: domain-shift and categoryshift. Main challenge is correctly distinguishing the unknown target samples while adapting the distribution of known class knowledge from source to target. Most existing methods approach this problem by first training the target adapted known and then relying on the single threshold to distinguish unknown target samples.
arXiv Detail & Related papers (2022-12-16T09:01:57Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood. Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings. In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z)
Aggregating Pairwise Semantic Differences for Few-Shot Claim Veracity Classification [21.842139093124512]
We introduce SEED, a novel vector-based method to claim veracity classification. We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class. Experiments conducted on the FEVER and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings.
arXiv Detail & Related papers (2022-05-11T17:23:37Z)
Complete Agent-driven Model-based System Testing for Autonomous Systems [0.0]
A novel approach to testing complex autonomous transportation systems is described. It is intended to mitigate some of the most critical problems regarding verification and validation.
arXiv Detail & Related papers (2021-10-25T01:55:24Z)
A New Score for Adaptive Tests in Bayesian and Credal Networks [64.80185026979883]
A test is adaptive when its sequence and number of questions is dynamically tuned on the basis of the estimated skills of the taker. We present an alternative family of scores, based on the mode of the posterior probabilities, and hence easier to explain.
arXiv Detail & Related papers (2021-05-25T20:35:42Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.