Related papers: Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees

Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees

URL: http://arxiv.org/abs/2110.10729v1
Date: Wed, 20 Oct 2021 19:05:00 GMT
Title: Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees
Authors: Giulia Pedrielli, Tanmay Kandhait, Surdeep Chotaliya, Quinn Thibeault, Hao Huang, Mauricio Castillo-Effen, Georgios Fainekos
Abstract summary: falsification has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and applicability of falsification methods, they all share a common characteristic. They are best-effort methods which do not provide any guarantees on the absence of erroneous behaviors (falsifiers) when the testing budget is exhausted.
Score: 3.9119084077397863
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Requirements driven search-based testing (also known as falsification) has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and applicability of falsification methods, they all share a common characteristic. Namely, they are best-effort methods which do not provide any guarantees on the absence of erroneous behaviors (falsifiers) when the testing budget is exhausted. The absence of finite time guarantees is a major limitation which prevents falsification methods from being utilized in certification procedures. In this paper, we address the finite-time guarantees problem by developing a new stochastic algorithm. Our proposed algorithm not only estimates (bounds) the probability that falsifying behaviors exist, but also it identifies the regions where these falsifying behaviors may occur. We demonstrate the applicability of our approach on standard benchmark functions from the optimization literature and on the F16 benchmark problem.

Related papers

SILENT: A New Lens on Statistics in Software Timing Side Channels [10.872605368135343]
Recent attacks have challenged our understanding of what it means for code to execute in constant time on modern CPUs. We introduce a new algorithm for the analysis of timing measurements with strong, formal statistical guarantees. We demonstrate the necessity, effectiveness, and benefits of our approach on both synthetic benchmarks and real-world applications.
arXiv Detail & Related papers (2025-04-28T14:22:23Z)
An $(ε,δ)$-accurate level set estimation with a stopping criterion [2.5124069610074895]
This paper introduces an acquisition strategy for level set estimation that incorporates a stopping criterion. We show that our method satisfies $epsilon$-accuracy with a confidence level of $1 - delta$, addressing a key gap in existing approaches.
arXiv Detail & Related papers (2025-03-26T06:41:07Z)
Automatically Adaptive Conformal Risk Control [49.95190019041905]
We propose a methodology for achieving approximate conditional control of statistical risks by adapting to the difficulty of test samples. Our framework goes beyond traditional conditional risk control based on user-provided conditioning events to the algorithmic, data-driven determination of appropriate function classes for conditioning.
arXiv Detail & Related papers (2024-06-25T08:29:32Z)
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem. Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z)
Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection [1.8990839669542954]
We propose a cost-sensitive framework for object detection tailored to user-defined budgets. We derive minimum thresholding requirements to prevent performance degradation. We automate and optimize the thresholding process to maximize the failure recognition rate.
arXiv Detail & Related papers (2024-04-26T14:03:55Z)
Verification-Aided Learning of Neural Network Barrier Functions with Termination Guarantees [6.9060054915724]
Barrier functions are a general framework for establishing a safety guarantee for a system. There is no general method for finding these functions. Recent approaches use self-supervised learning techniques to learn these functions.
arXiv Detail & Related papers (2024-03-12T04:29:43Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
Bayesian Quadrature for Probability Threshold Robustness of Partially Undefined Functions [0.4297070083645048]
State of the art algorithms exist to calculate the probability that the performance of a system is satisfactory under uncertainty. These algorithms cannot be applied to problems which often occur in the autonomous vehicle domain where the performance of a system may be undefined. We solve this problem using a hierarchical model for the system performance, where undefined performance is classified before the performance is regressed.
arXiv Detail & Related papers (2022-10-05T11:50:36Z)
CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning. We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
Certifying Neural Network Robustness to Random Input Noise from Samples [14.191310794366075]
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. We propose a novel robustness certification method that upper bounds the probability of misclassification when the input noise follows an arbitrary probability distribution.
arXiv Detail & Related papers (2020-10-15T05:27:21Z)
Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts. Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures. We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z)
Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm. Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.