The Quest for Reliable Metrics of Responsible AI
- URL: http://arxiv.org/abs/2510.26007v1
- Date: Wed, 29 Oct 2025 22:35:34 GMT
- Title: The Quest for Reliable Metrics of Responsible AI
- Authors: Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma,
- Abstract summary: Development of AI, including AI in Science (AIS), should be done following the principles of responsible AI.<n>We reflect on prior work that examines the robustness of fairness metrics for recommender systems as a type of AI application.<n>Our guidelines apply to a broad spectrum of AI applications, including AIS.
- Score: 16.858097936239904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of Artificial Intelligence (AI), including AI in Science (AIS), should be done following the principles of responsible AI. Progress in responsible AI is often quantified through evaluation metrics, yet there has been less work on assessing the robustness and reliability of the metrics themselves. We reflect on prior work that examines the robustness of fairness metrics for recommender systems as a type of AI application and summarise their key takeaways into a set of non-exhaustive guidelines for developing reliable metrics of responsible AI. Our guidelines apply to a broad spectrum of AI applications, including AIS.
Related papers
- A Question Bank to Assess AI Inclusivity: Mapping out the Journey from Diversity Errors to Inclusion Excellence [5.364403920214549]
This paper introduces a structured AI inclusivity question bank, a comprehensive set of 253 questions designed to evaluate AI inclusivity.<n>The development of the question bank involved an iterative, multi-source approach, incorporating insights from literature reviews, D&I guidelines, Responsible AI frameworks.<n>The simulated evaluation, conducted with 70 AI-generated personas related to different AI jobs, assessed the question bank's relevance and effectiveness for AI inclusivity.
arXiv Detail & Related papers (2025-06-23T11:48:38Z) - The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z) - General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - POLARIS: A framework to guide the development of Trustworthy AI systems [3.02243271391691]
There is a significant gap between high-level AI ethics principles and low-level concrete practices for AI professionals.
We develop a novel holistic framework for Trustworthy AI - designed to bridge the gap between theory and practice.
Our goal is to empower AI professionals to confidently navigate the ethical dimensions of Trustworthy AI.
arXiv Detail & Related papers (2024-02-08T01:05:16Z) - Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for
AI Accountability [28.67753149592534]
This study bridges the accountability gap by introducing our effort towards a comprehensive metrics catalogue.
Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems.
arXiv Detail & Related papers (2023-11-22T04:43:16Z) - Guideline for Trustworthy Artificial Intelligence -- AI Assessment
Catalog [0.0]
It is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards.
The issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications.
This AI assessment catalog addresses exactly this point and is intended for two target groups.
arXiv Detail & Related papers (2023-06-20T08:07:18Z) - RE-centric Recommendations for the Development of Trustworthy(er)
Autonomous Systems [4.268504966623082]
Complying with the EU AI Act (AIA) guidelines while developing and implementing AI systems will soon be mandatory within the EU.
practitioners lack actionable instructions to operationalise ethics during AI systems development.
A literature review of different ethical guidelines revealed inconsistencies in the principles addressed and the terminology used to describe them.
arXiv Detail & Related papers (2023-05-29T11:57:07Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.