Functional trustworthiness of AI systems by statistically valid testing
- URL: http://arxiv.org/abs/2310.02727v1
- Date: Wed, 4 Oct 2023 11:07:52 GMT
- Title: Functional trustworthiness of AI systems by statistically valid testing
- Authors: Bernhard Nessler, Thomas Doms, Sepp Hochreiter
- Abstract summary: The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act.
We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways.
- Score: 7.717286312400472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The authors are concerned about the safety, health, and rights of the
European citizens due to inadequate measures and procedures required by the
current draft of the EU Artificial Intelligence (AI) Act for the conformity
assessment of AI systems. We observe that not only the current draft of the EU
AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have
resorted to the position that real functional guarantees of AI systems
supposedly would be unrealistic and too complex anyways. Yet enacting a
conformity assessment procedure that creates the false illusion of trust in
insufficiently assessed AI systems is at best naive and at worst grossly
negligent. The EU AI Act thus misses the point of ensuring quality by
functional trustworthiness and correctly attributing responsibilities.
The trustworthiness of an AI decision system lies first and foremost in the
correct statistical testing on randomly selected samples and in the precision
of the definition of the application domain, which enables drawing samples in
the first place. We will subsequently call this testable quality functional
trustworthiness. It includes a design, development, and deployment that enables
correct statistical testing of all relevant functions.
We are firmly convinced and advocate that a reliable assessment of the
statistical functional properties of an AI system has to be the indispensable,
mandatory nucleus of the conformity assessment. In this paper, we describe the
three necessary elements to establish a reliable functional trustworthiness,
i.e., (1) the definition of the technical distribution of the application, (2)
the risk-based minimum performance requirements, and (3) the statistically
valid testing based on independent random samples.
Related papers
- Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Navigating the EU AI Act: A Methodological Approach to Compliance for Safety-critical Products [0.0]
This paper presents a methodology for interpreting the EU AI Act requirements for high-risk AI systems.
We first propose an extended product quality model for AI systems, incorporating attributes relevant to the Act not covered by current quality models.
We then propose a contract-based approach to derive technical requirements at the stakeholder level.
arXiv Detail & Related papers (2024-03-25T14:32:18Z) - Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form
Medical Question Answering Applications and Beyond [63.969531254692725]
Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems.
We propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to semantic relevance.
We show that WSE exhibits superior performance on accurate uncertainty measurement under two standard criteria for correctness evaluation.
arXiv Detail & Related papers (2024-02-22T03:46:08Z) - RAISE -- Radiology AI Safety, an End-to-end lifecycle approach [5.829180249228172]
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency.
The focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy.
The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
arXiv Detail & Related papers (2023-11-24T15:59:14Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - No Trust without regulation! [0.0]
The explosion in performance of Machine Learning (ML) and the potential of its applications are encouraging us to consider its use in industrial systems.
It is still leaving too much to one side the issue of safety and its corollary, regulation and standards.
The European Commission has laid the foundations for moving forward and building solid approaches to the integration of AI-based applications that are safe, trustworthy and respect European ethical values.
arXiv Detail & Related papers (2023-09-27T09:08:41Z) - Designing for Responsible Trust in AI Systems: A Communication
Perspective [56.80107647520364]
We draw from communication theories and literature on trust in technologies to develop a conceptual model called MATCH.
We highlight transparency and interaction as AI systems' affordances that present a wide range of trustworthiness cues to users.
We propose a checklist of requirements to help technology creators identify appropriate cues to use.
arXiv Detail & Related papers (2022-04-29T00:14:33Z) - Statistical Perspectives on Reliability of Artificial Intelligence
Systems [6.284088451820049]
We provide statistical perspectives on the reliability of AI systems.
We introduce a so-called SMART statistical framework for AI reliability research.
We discuss recent developments in modeling and analysis of AI reliability.
arXiv Detail & Related papers (2021-11-09T20:00:14Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - Multisource AI Scorecard Table for System Evaluation [3.74397577716445]
The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist.
The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
arXiv Detail & Related papers (2021-02-08T03:37:40Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.