Confidence in Assurance 2.0 Cases
- URL: http://arxiv.org/abs/2409.10665v1
- Date: Mon, 16 Sep 2024 19:00:21 GMT
- Title: Confidence in Assurance 2.0 Cases
- Authors: Robin Bloomfield, John Rushby,
- Abstract summary: We consider how confidence can be assessed in the rigorous approach we call Assurance 2.0.
Our goal is indefeasible confidence and we approach it from four different perspectives.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: An assurance case should provide justifiable confidence in the truth of a claim about some critical property of a system or procedure, such as safety or security. We consider how confidence can be assessed in the rigorous approach we call Assurance 2.0. Our goal is indefeasible confidence and we approach it from four different perspectives: logical soundness, probabilistic assessment, dialectical examination, and residual risks.
Related papers
- Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences [62.52739672949452]
Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary.
We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence.
Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores.
arXiv Detail & Related papers (2025-02-03T07:43:27Z) - Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.
We propose methods tailored to the unique properties of perception and decision-making.
We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - Automating Semantic Analysis of System Assurance Cases using Goal-directed ASP [1.2189422792863451]
We present our approach to enhancing Assurance 2.0 with semantic rule-based analysis capabilities.
We examine the unique semantic aspects of assurance cases, such as logical consistency, adequacy, indefeasibility, etc.
arXiv Detail & Related papers (2024-08-21T15:22:43Z) - Trustworthiness for an Ultra-Wideband Localization Service [2.4979362117484714]
This paper proposes a holistic trustworthiness assessment framework for ultra-wideband self-localization.
Our goal is to provide guidance for evaluating a system's trustworthiness based on objective evidence.
Our approach guarantees that the resulting trustworthiness indicators correspond to chosen real-world threats.
arXiv Detail & Related papers (2024-08-10T11:57:10Z) - When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD)
It integrates quality reward and order-preserving alignment reward functions.
Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z) - Did You Mean...? Confidence-based Trade-offs in Semantic Parsing [52.28988386710333]
We show how a calibrated model can help balance common trade-offs in task-oriented parsing.
We then examine how confidence scores can help optimize the trade-off between usability and safety.
arXiv Detail & Related papers (2023-03-29T17:07:26Z) - Demonstrating Software Reliability using Possibly Correlated Tests:
Insights from a Conservative Bayesian Approach [2.152298082788376]
We formalise informal notions of "doubting" that the executions are independent.
We develop techniques that reveal the extent to which independence assumptions can undermine conservatism in assessments.
arXiv Detail & Related papers (2022-08-16T20:27:47Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Assessing Confidence with Assurance 2.0 [0.0]
We argue that confidence cannot be reduced to a single attribute or measurement.
Positive Perspectives consider the extent to which the evidence and overall argument of the case combine to make a positive statement.
Negative Perspectives record doubts and challenges to the case, typically expressed as defeaters.
Residual Doubts: the world is uncertain so not all potential defeaters can be resolved.
arXiv Detail & Related papers (2022-05-03T22:10:59Z) - Bootstrapping confidence in future safety based on past safe operation [0.0]
We show an approach to confidence of low enough probability of causing accidents in the early phases of operation.
This formalises the common approach of operating a system on a limited basis in the hope that mishap-free operation will confirm one's confidence in its safety.
arXiv Detail & Related papers (2021-10-20T18:36:23Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.