Related papers: Demonstrating Software Reliability using Possibly Correlated Tests: Insights from a Conservative Bayesian Approach

Demonstrating Software Reliability using Possibly Correlated Tests: Insights from a Conservative Bayesian Approach

URL: http://arxiv.org/abs/2208.07935v3
Date: Wed, 11 Oct 2023 13:18:41 GMT
Title: Demonstrating Software Reliability using Possibly Correlated Tests: Insights from a Conservative Bayesian Approach
Authors: Kizito Salako, Xingyu Zhao
Abstract summary: We formalise informal notions of "doubting" that the executions are independent. We develop techniques that reveal the extent to which independence assumptions can undermine conservatism in assessments.
Score: 2.152298082788376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents Bayesian techniques for conservative claims about software reliability, particularly when evidence suggests the software's executions are not statistically independent. We formalise informal notions of "doubting" that the executions are independent, and incorporate such doubts into reliability assessments. We develop techniques that reveal the extent to which independence assumptions can undermine conservatism in assessments, and identify conditions under which this impact is not significant. These techniques - novel extensions of conservative Bayesian inference (CBI) approaches - give conservative confidence bounds on the software's failure probability per execution. With illustrations in two application areas - nuclear power-plant safety and autonomous vehicle (AV) safety - our analyses reveals: 1) the confidence an assessor should possess before subjecting a system to operational testing. Otherwise, such testing is futile - favourable operational testing evidence will eventually decrease one's confidence in the system being sufficiently reliable; 2) the independence assumption supports conservative claims sometimes; 3) in some scenarios, observing a system operate without failure gives less confidence in the system than if some failures had been observed; 4) building confidence in a system is very sensitive to failures - each additional failure means significantly more operational testing is required, in order to support a reliability claim.

Related papers

SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU) We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level. Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z)
Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation. We propose methods tailored to the unique properties of perception and decision-making. We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z)
On the Robustness of Adversarial Training Against Uncertainty Attacks [9.180552487186485]
In learning problems, the noise inherent to the task at hand hinders the possibility to infer without a certain degree of uncertainty. In this work, we reveal both empirically and theoretically that defending against adversarial examples, i.e., carefully perturbed samples that cause misclassification, guarantees a more secure, trustworthy uncertainty estimate. To support our claims, we evaluate multiple adversarial-robust models from the publicly available benchmark RobustBench on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2024-10-29T11:12:44Z)
Trustworthiness for an Ultra-Wideband Localization Service [2.4979362117484714]
This paper proposes a holistic trustworthiness assessment framework for ultra-wideband self-localization. Our goal is to provide guidance for evaluating a system's trustworthiness based on objective evidence. Our approach guarantees that the resulting trustworthiness indicators correspond to chosen real-world threats.
arXiv Detail & Related papers (2024-08-10T11:57:10Z)
Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
Did You Mean...? Confidence-based Trade-offs in Semantic Parsing [52.28988386710333]
We show how a calibrated model can help balance common trade-offs in task-oriented parsing. We then examine how confidence scores can help optimize the trade-off between usability and safety.
arXiv Detail & Related papers (2023-03-29T17:07:26Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Confidence Composition for Monitors of Verification Assumptions [3.500426151907193]
We propose a three-step framework for monitoring the confidence in verification assumptions. In two case studies, we demonstrate that the composed monitors improve over their constituents and successfully predict safety violations.
arXiv Detail & Related papers (2021-11-03T18:14:35Z)
Reliability Testing for Natural Language Processing Systems [14.393308846231083]
We argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests.
arXiv Detail & Related papers (2021-05-06T11:24:58Z)
Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles [3.629865579485447]
Demonstrating high reliability and safety for safety-critical systems (SCSs) remains a hard problem. We use Autonomous Vehicles (AVs) as a current example to revisit the problem of demonstrating high reliability.
arXiv Detail & Related papers (2020-08-19T19:50:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.