Related papers: Multisource AI Scorecard Table for System Evaluation

Related papers

A Conceptual Framework for AI Capability Evaluations [0.0]
We propose a conceptual framework for analyzing AI capability evaluations.<n>It offers a structured, descriptive approach that systematizes the analysis of widely used methods and terminology.<n>It also enables researchers to identify methodological weaknesses, assists practitioners in designing evaluations, and provides policymakers with a tool to scrutinize, compare, and navigate complex evaluation landscapes.
arXiv Detail & Related papers (2025-06-23T00:19:27Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Is Trust Correlated With Explainability in AI? A Meta-Analysis [0.0]
We conduct a comprehensive examination of the existing literature to explore the relationship between AI explainability and trust. Our analysis, incorporating data from 90 studies, reveals a statistically significant but moderate positive correlation between the explainability of AI systems and the trust they engender. This research highlights its broader socio-technical implications, particularly in promoting accountability and fostering user trust in critical domains such as healthcare and justice.
arXiv Detail & Related papers (2025-04-16T23:30:55Z)
General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems. We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z)
Compliance of AI Systems [0.0]
This paper systematically examines the compliance of AI systems with relevant legislation, focusing on the EU's AI Act. The analysis highlighted many challenges associated with edge devices, which are increasingly being used to deploy AI applications closer and closer to the data sources. The importance of data set compliance is highlighted as a cornerstone for ensuring the trustworthiness, transparency, and explainability of AI systems.
arXiv Detail & Related papers (2025-03-07T16:53:36Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development [41.64451715899638]
High-level AI labels, inspired by frameworks like EU energy labels, have been proposed to make the properties of AI models more transparent. This study evaluates AI labeling through qualitative interviews along four key research questions.
arXiv Detail & Related papers (2025-01-21T06:00:14Z)
Can We Trust AI Agents? An Experimental Study Towards Trustworthy LLM-Based Multi-Agent Systems for AI Ethics [10.084913433923566]
This study examines how trustworthiness-enhancing techniques affect ethical AI output generation. We design the prototype LLM-BMAS, where agents engage in structured discussions on real-world ethical AI issues. Discussions reveal terms like bias detection, transparency, accountability, user consent, compliance, fairness evaluation, and EU AI Act compliance.
arXiv Detail & Related papers (2024-10-25T20:17:59Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
AI Cards: Towards an Applied Framework for Machine-Readable AI and Risk Documentation Inspired by the EU AI Act [2.1897070577406734]
Despite its importance, there is a lack of standards and guidelines to assist with drawing up AI and risk documentation aligned with the AI Act. We propose AI Cards as a novel holistic framework for representing a given intended use of an AI system.
arXiv Detail & Related papers (2024-06-26T09:51:49Z)
Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 14110 [44.99833362998488]
Policy documents, such as legislation, regulations, and executive orders, are crucial in shaping society. This study aims to evaluate the potential of AI in streamlining policy analysis and to identify the strengths and limitations of current AI approaches.
arXiv Detail & Related papers (2024-06-10T11:19:28Z)
Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs) It examines the challenges introduced by AI components and the impact on testing procedures. The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z)
PADTHAI-MM: A Principled Approach for Designing Trustable, Human-centered AI systems using the MAST Methodology [5.38932801848643]
The Multisource AI Scorecard Table (MAST), a checklist rating system, addresses this gap in designing and evaluating AI-enabled decision support systems. We propose the Principled Approach for Designing Trustable Human-centered AI systems using MAST methodology. We show that MAST-guided design can improve trust perceptions, and that MAST criteria can be linked to performance, process, and purpose information.
arXiv Detail & Related papers (2024-01-24T23:15:44Z)
Towards Responsible AI in Banking: Addressing Bias for Fair Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture. This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias. In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z)
Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability [28.67753149592534]
This study bridges the accountability gap by introducing our effort towards a comprehensive metrics catalogue. Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems.
arXiv Detail & Related papers (2023-11-22T04:43:16Z)
Explainable Authorship Identification in Cultural Heritage Applications: Analysis of a New Perspective [48.031678295495574]
We explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId. In particular, we assess the relative merits of three different types of XAI techniques on three different AId tasks. Our analysis shows that, while these techniques make important first steps towards explainable Authorship Identification, more work remains to be done.
arXiv Detail & Related papers (2023-11-03T20:51:15Z)
Guideline for Trustworthy Artificial Intelligence -- AI Assessment Catalog [0.0]
It is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards. The issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications. This AI assessment catalog addresses exactly this point and is intended for two target groups.
arXiv Detail & Related papers (2023-06-20T08:07:18Z)
A Study of Situational Reasoning for Traffic Understanding [63.45021731775964]
We devise three novel text-based tasks for situational reasoning in the traffic domain. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work. We provide in-depth analyses of model performance on data partitions and examine model predictions categorically.
arXiv Detail & Related papers (2023-06-05T01:01:12Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.