Trustworthy Artificial Intelligence in the Context of Metrology
- URL: http://arxiv.org/abs/2406.10117v1
- Date: Fri, 14 Jun 2024 15:23:27 GMT
- Title: Trustworthy Artificial Intelligence in the Context of Metrology
- Authors: Tameem Adel, Sam Bilson, Mark Levene, Andrew Thompson,
- Abstract summary: We review research at the National Physical Laboratory in the area of trustworthy artificial intelligence (TAI)
We describe three broad themes of TAI: technical, socio-technical and social, which play key roles in ensuring that the developed models are trustworthy and can be relied upon to make responsible decisions.
We discuss three research areas within TAI that we are working on at NPL, and examine the certification of AI systems in terms of adherence to the characteristics of TAI.
- Score: 3.2873782624127834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We review research at the National Physical Laboratory (NPL) in the area of trustworthy artificial intelligence (TAI), and more specifically trustworthy machine learning (TML), in the context of metrology, the science of measurement. We describe three broad themes of TAI: technical, socio-technical and social, which play key roles in ensuring that the developed models are trustworthy and can be relied upon to make responsible decisions. From a metrology perspective we emphasise uncertainty quantification (UQ), and its importance within the framework of TAI to enhance transparency and trust in the outputs of AI systems. We then discuss three research areas within TAI that we are working on at NPL, and examine the certification of AI systems in terms of adherence to the characteristics of TAI.
Related papers
- Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed.
In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness.
We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z) - Quantifying AI Vulnerabilities: A Synthesis of Complexity, Dynamical Systems, and Game Theory [0.0]
We propose a novel approach that introduces three metrics: System Complexity Index (SCI), Lyapunov Exponent for AI Stability (LEAIS), and Nash Equilibrium Robustness (NER)
SCI quantifies the inherent complexity of an AI system, LEAIS captures its stability and sensitivity to perturbations, and NER evaluates its strategic robustness against adversarial manipulation.
arXiv Detail & Related papers (2024-04-07T07:05:59Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research [23.273934717819795]
This paper presents a systematic literature review of approaches that aim to improve the explainability of AI models within the context of Software Engineering.
We aim to summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches.
arXiv Detail & Related papers (2024-01-26T03:20:40Z) - PADTHAI-MM: Principles-based Approach for Designing Trustworthy, Human-centered AI using MAST Methodology [5.215782336985273]
The Multisource AI Scorecard Table (MAST) was designed to bridge the gap by offering a systematic, tradecraft-centered approach to evaluating AI-enabled decision support systems.
We introduce an iterative design framework called textitPrinciples-based Approach for Designing Trustworthy, Human-centered AI.
We demonstrate this framework in our development of the Reporting Assistant for Defense and Intelligence Tasks (READIT)
arXiv Detail & Related papers (2024-01-24T23:15:44Z) - Evaluating Trustworthiness of AI-Enabled Decision Support Systems:
Validation of the Multisource AI Scorecard Table (MAST) [10.983659980278926]
The Multisource AI Scorecard Table (MAST) is a checklist tool to inform the design and evaluation of trustworthy AI systems.
We evaluate whether MAST is associated with people's trust perceptions in AI-enabled decision support systems.
arXiv Detail & Related papers (2023-11-29T19:34:15Z) - A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain.
We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work.
We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z) - The Role of Large Language Models in the Recognition of Territorial
Sovereignty: An Analysis of the Construction of Legitimacy [67.44950222243865]
We argue that technology tools like Google Maps and Large Language Models (LLM) are often perceived as impartial and objective.
We highlight the case of three controversial territories: Crimea, West Bank and Transnitria, by comparing the responses of ChatGPT against Wikipedia information and United Nations resolutions.
arXiv Detail & Related papers (2023-03-17T08:46:49Z) - AI Maintenance: A Robustness Perspective [91.28724422822003]
We introduce highlighted robustness challenges in the AI lifecycle and motivate AI maintenance by making analogies to car maintenance.
We propose an AI model inspection framework to detect and mitigate robustness risks.
Our proposal for AI maintenance facilitates robustness assessment, status tracking, risk scanning, model hardening, and regulation throughout the AI lifecycle.
arXiv Detail & Related papers (2023-01-08T15:02:38Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - Multisource AI Scorecard Table for System Evaluation [3.74397577716445]
The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist.
The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
arXiv Detail & Related papers (2021-02-08T03:37:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.