Related papers: Trustworthy Artificial Intelligence in the Context of Metrology

Trustworthy Artificial Intelligence in the Context of Metrology

URL: http://arxiv.org/abs/2406.10117v1
Date: Fri, 14 Jun 2024 15:23:27 GMT
Title: Trustworthy Artificial Intelligence in the Context of Metrology
Authors: Tameem Adel, Sam Bilson, Mark Levene, Andrew Thompson,
Abstract summary: We review research at the National Physical Laboratory in the area of trustworthy artificial intelligence (TAI) We describe three broad themes of TAI: technical, socio-technical and social, which play key roles in ensuring that the developed models are trustworthy and can be relied upon to make responsible decisions. We discuss three research areas within TAI that we are working on at NPL, and examine the certification of AI systems in terms of adherence to the characteristics of TAI.
Score: 3.2873782624127834
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We review research at the National Physical Laboratory (NPL) in the area of trustworthy artificial intelligence (TAI), and more specifically trustworthy machine learning (TML), in the context of metrology, the science of measurement. We describe three broad themes of TAI: technical, socio-technical and social, which play key roles in ensuring that the developed models are trustworthy and can be relied upon to make responsible decisions. From a metrology perspective we emphasise uncertainty quantification (UQ), and its importance within the framework of TAI to enhance transparency and trust in the outputs of AI systems. We then discuss three research areas within TAI that we are working on at NPL, and examine the certification of AI systems in terms of adherence to the characteristics of TAI.

Related papers

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Is Trust Correlated With Explainability in AI? A Meta-Analysis [0.0]
We conduct a comprehensive examination of the existing literature to explore the relationship between AI explainability and trust. Our analysis, incorporating data from 90 studies, reveals a statistically significant but moderate positive correlation between the explainability of AI systems and the trust they engender. This research highlights its broader socio-technical implications, particularly in promoting accountability and fostering user trust in critical domains such as healthcare and justice.
arXiv Detail & Related papers (2025-04-16T23:30:55Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI) This paper explores potential areas where statisticians can make important contributions to the development of LLMs. We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed. In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness. We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z)
Quantifying AI Vulnerabilities: A Synthesis of Complexity, Dynamical Systems, and Game Theory [0.0]
We propose a novel approach that introduces three metrics: System Complexity Index (SCI), Lyapunov Exponent for AI Stability (LEAIS), and Nash Equilibrium Robustness (NER) SCI quantifies the inherent complexity of an AI system, LEAIS captures its stability and sensitivity to perturbations, and NER evaluates its strategic robustness against adversarial manipulation.
arXiv Detail & Related papers (2024-04-07T07:05:59Z)
Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs) It examines the challenges introduced by AI components and the impact on testing procedures. The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z)
A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research [23.966640472958105]
This paper presents a systematic literature review of approaches that aim to improve the explainability of AI models within the context of Software Engineering. We aim to summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches.
arXiv Detail & Related papers (2024-01-26T03:20:40Z)
PADTHAI-MM: A Principled Approach for Designing Trustable, Human-centered AI systems using the MAST Methodology [5.38932801848643]
The Multisource AI Scorecard Table (MAST), a checklist rating system, addresses this gap in designing and evaluating AI-enabled decision support systems. We propose the Principled Approach for Designing Trustable Human-centered AI systems using MAST methodology. We show that MAST-guided design can improve trust perceptions, and that MAST criteria can be linked to performance, process, and purpose information.
arXiv Detail & Related papers (2024-01-24T23:15:44Z)
Evaluating Trustworthiness of AI-Enabled Decision Support Systems: Validation of the Multisource AI Scorecard Table (MAST) [10.983659980278926]
The Multisource AI Scorecard Table (MAST) is a checklist tool to inform the design and evaluation of trustworthy AI systems. We evaluate whether MAST is associated with people's trust perceptions in AI-enabled decision support systems.
arXiv Detail & Related papers (2023-11-29T19:34:15Z)
A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work. We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z)
The Role of Large Language Models in the Recognition of Territorial Sovereignty: An Analysis of the Construction of Legitimacy [67.44950222243865]
We argue that technology tools like Google Maps and Large Language Models (LLM) are often perceived as impartial and objective. We highlight the case of three controversial territories: Crimea, West Bank and Transnitria, by comparing the responses of ChatGPT against Wikipedia information and United Nations resolutions.
arXiv Detail & Related papers (2023-03-17T08:46:49Z)
AI Maintenance: A Robustness Perspective [91.28724422822003]
We introduce highlighted robustness challenges in the AI lifecycle and motivate AI maintenance by making analogies to car maintenance. We propose an AI model inspection framework to detect and mitigate robustness risks. Our proposal for AI maintenance facilitates robustness assessment, status tracking, risk scanning, model hardening, and regulation throughout the AI lifecycle.
arXiv Detail & Related papers (2023-01-08T15:02:38Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
Multisource AI Scorecard Table for System Evaluation [3.74397577716445]
The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist. The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
arXiv Detail & Related papers (2021-02-08T03:37:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.