Multisource AI Scorecard Table for System Evaluation
- URL: http://arxiv.org/abs/2102.03985v1
- Date: Mon, 8 Feb 2021 03:37:40 GMT
- Title: Multisource AI Scorecard Table for System Evaluation
- Authors: Erik Blasch, James Sung, Tao Nguyen
- Abstract summary: The paper describes a Multisource AI Scorecard Table (MAST) that provides the developer and user of an artificial intelligence (AI)/machine learning (ML) system with a standard checklist.
The paper explores how the analytic tradecraft standards outlined in Intelligence Community Directive (ICD) 203 can provide a framework for assessing the performance of an AI system.
- Score: 3.74397577716445
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The paper describes a Multisource AI Scorecard Table (MAST) that provides the
developer and user of an artificial intelligence (AI)/machine learning (ML)
system with a standard checklist focused on the principles of good analysis
adopted by the intelligence community (IC) to help promote the development of
more understandable systems and engender trust in AI outputs. Such a scorecard
enables a transparent, consistent, and meaningful understanding of AI tools
applied for commercial and government use. A standard is built on compliance
and agreement through policy, which requires buy-in from the stakeholders.
While consistency for testing might only exist across a standard data set, the
community requires discussion on verification and validation approaches which
can lead to interpretability, explainability, and proper use. The paper
explores how the analytic tradecraft standards outlined in Intelligence
Community Directive (ICD) 203 can provide a framework for assessing the
performance of an AI system supporting various operational needs. These include
sourcing, uncertainty, consistency, accuracy, and visualization. Three use
cases are presented as notional examples that support security for comparative
analysis.
Related papers
- Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development [41.64451715899638]
High-level AI labels, inspired by frameworks like EU energy labels, have been proposed to make the properties of AI models more transparent.
This study evaluates AI labeling through qualitative interviews along four key research questions.
arXiv Detail & Related papers (2025-01-21T06:00:14Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - AI Cards: Towards an Applied Framework for Machine-Readable AI and Risk Documentation Inspired by the EU AI Act [2.1897070577406734]
Despite its importance, there is a lack of standards and guidelines to assist with drawing up AI and risk documentation aligned with the AI Act.
We propose AI Cards as a novel holistic framework for representing a given intended use of an AI system.
arXiv Detail & Related papers (2024-06-26T09:51:49Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - PADTHAI-MM: Principles-based Approach for Designing Trustworthy, Human-centered AI using MAST Methodology [5.215782336985273]
The Multisource AI Scorecard Table (MAST) was designed to bridge the gap by offering a systematic, tradecraft-centered approach to evaluating AI-enabled decision support systems.
We introduce an iterative design framework called textitPrinciples-based Approach for Designing Trustworthy, Human-centered AI.
We demonstrate this framework in our development of the Reporting Assistant for Defense and Intelligence Tasks (READIT)
arXiv Detail & Related papers (2024-01-24T23:15:44Z) - Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for
AI Accountability [28.67753149592534]
This study bridges the accountability gap by introducing our effort towards a comprehensive metrics catalogue.
Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems.
arXiv Detail & Related papers (2023-11-22T04:43:16Z) - Explainable Authorship Identification in Cultural Heritage Applications:
Analysis of a New Perspective [48.031678295495574]
We explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId.
In particular, we assess the relative merits of three different types of XAI techniques on three different AId tasks.
Our analysis shows that, while these techniques make important first steps towards explainable Authorship Identification, more work remains to be done.
arXiv Detail & Related papers (2023-11-03T20:51:15Z) - Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution [48.86322922826514]
This paper defines a new task of Knowledge-aware Language Model Attribution (KaLMA)
First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios.
Second, we propose a new Conscious Incompetence" setting considering the incomplete knowledge repository.
Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment.
arXiv Detail & Related papers (2023-10-09T11:45:59Z) - Guideline for Trustworthy Artificial Intelligence -- AI Assessment
Catalog [0.0]
It is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards.
The issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications.
This AI assessment catalog addresses exactly this point and is intended for two target groups.
arXiv Detail & Related papers (2023-06-20T08:07:18Z) - A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain.
We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work.
We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.