Demonstrating Software Reliability using Possibly Correlated Tests:
Insights from a Conservative Bayesian Approach
- URL: http://arxiv.org/abs/2208.07935v3
- Date: Wed, 11 Oct 2023 13:18:41 GMT
- Title: Demonstrating Software Reliability using Possibly Correlated Tests:
Insights from a Conservative Bayesian Approach
- Authors: Kizito Salako, Xingyu Zhao
- Abstract summary: We formalise informal notions of "doubting" that the executions are independent.
We develop techniques that reveal the extent to which independence assumptions can undermine conservatism in assessments.
- Score: 2.152298082788376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Bayesian techniques for conservative claims about
software reliability, particularly when evidence suggests the software's
executions are not statistically independent. We formalise informal notions of
"doubting" that the executions are independent, and incorporate such doubts
into reliability assessments. We develop techniques that reveal the extent to
which independence assumptions can undermine conservatism in assessments, and
identify conditions under which this impact is not significant. These
techniques - novel extensions of conservative Bayesian inference (CBI)
approaches - give conservative confidence bounds on the software's failure
probability per execution. With illustrations in two application areas -
nuclear power-plant safety and autonomous vehicle (AV) safety - our analyses
reveals: 1) the confidence an assessor should possess before subjecting a
system to operational testing. Otherwise, such testing is futile - favourable
operational testing evidence will eventually decrease one's confidence in the
system being sufficiently reliable; 2) the independence assumption supports
conservative claims sometimes; 3) in some scenarios, observing a system operate
without failure gives less confidence in the system than if some failures had
been observed; 4) building confidence in a system is very sensitive to failures
- each additional failure means significantly more operational testing is
required, in order to support a reliability claim.
Related papers
- Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Did You Mean...? Confidence-based Trade-offs in Semantic Parsing [52.28988386710333]
We show how a calibrated model can help balance common trade-offs in task-oriented parsing.
We then examine how confidence scores can help optimize the trade-off between usability and safety.
arXiv Detail & Related papers (2023-03-29T17:07:26Z) - Trust, but Verify: Using Self-Supervised Probing to Improve
Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model.
We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Confidence Composition for Monitors of Verification Assumptions [3.500426151907193]
We propose a three-step framework for monitoring the confidence in verification assumptions.
In two case studies, we demonstrate that the composed monitors improve over their constituents and successfully predict safety violations.
arXiv Detail & Related papers (2021-11-03T18:14:35Z) - Bootstrapping confidence in future safety based on past safe operation [0.0]
We show an approach to confidence of low enough probability of causing accidents in the early phases of operation.
This formalises the common approach of operating a system on a limited basis in the hope that mishap-free operation will confirm one's confidence in its safety.
arXiv Detail & Related papers (2021-10-20T18:36:23Z) - Learning Uncertainty For Safety-Oriented Semantic Segmentation In
Autonomous Driving [77.39239190539871]
We show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving.
We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function.
We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods.
arXiv Detail & Related papers (2021-05-28T09:23:05Z) - Reliability Testing for Natural Language Processing Systems [14.393308846231083]
We argue for the need for reliability testing and contextualize it among existing work on improving accountability.
We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests.
arXiv Detail & Related papers (2021-05-06T11:24:58Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Assessing Safety-Critical Systems from Operational Testing: A Study on
Autonomous Vehicles [3.629865579485447]
Demonstrating high reliability and safety for safety-critical systems (SCSs) remains a hard problem.
We use Autonomous Vehicles (AVs) as a current example to revisit the problem of demonstrating high reliability.
arXiv Detail & Related papers (2020-08-19T19:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.