Towards an Approach for Evaluating the Impact of AI Standards
- URL: http://arxiv.org/abs/2506.13839v1
- Date: Mon, 16 Jun 2025 13:58:59 GMT
- Title: Towards an Approach for Evaluating the Impact of AI Standards
- Authors: Julia Lane,
- Abstract summary: The goal of AI standards is to promote innovation and public trust in systems that use AI.<n>There is a lack of a formal or shared method to measure the impact of these standardization activities on the goals of innovation and trust.<n>This concept paper proposes an analytical approach that could inform the evaluation of the impact of AI standards.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There have been multiple calls for investments in the development of AI standards that both preserve the transformative potential and minimize the risks of AI. The goals of AI standards, particularly with respect to AI data, performance, and governance, are to promote innovation and public trust in systems that use AI. However, there is a lack of a formal or shared method to measure the impact of these standardization activities on the goals of innovation and trust. This concept paper proposes an analytical approach that could inform the evaluation of the impact of AI standards. The proposed approach could be used to measure, assess, and eventually evaluate the extent to which AI standards achieve their stated goals, since most Standards Development Organizationss do not track the impact of their standards once completed. It is intended to stimulate discussions with a wide variety of stakeholders, including academia and the standards community, about the potential for the approach to evaluate the effectiveness, utility, and relative value of AI standards. The document draws on successful and well-tested evaluation frameworks, tools, and metrics that are used for monitoring and assessing the effect of programmatic interventions in other domains to describe a possible approach. It begins by describing the context within which an evaluation would be designed, and then introduces a standard evaluation framework. These sections are followed by a description of what outputs and outcomes might result from the adoption and implementation of AI standards and the process whereby those AI standards are developed . Subsequent sections provide an overview of how the effectiveness of AI standards might be assessed and a conclusion.
Related papers
- Evaluation Framework for AI Systems in "the Wild" [37.48117853114386]
Generative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use.<n>Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect real-world performance.<n>This white paper proposes a comprehensive framework for how we should evaluate real-world GenAI systems.
arXiv Detail & Related papers (2025-04-23T14:52:39Z) - HH4AI: A methodological Framework for AI Human Rights impact assessment under the EUAI ACT [1.7754875105502606]
The paper highlights AIs transformative nature, driven by autonomy, data, and goal-oriented design.<n>A key challenge is defining and assessing "high-risk" AI systems across industries.<n>It proposes a Fundamental Rights Impact Assessment (FRIA) methodology, a gate-based framework designed to isolate and assess risks.
arXiv Detail & Related papers (2025-03-23T19:10:14Z) - Securing External Deeper-than-black-box GPAI Evaluations [49.1574468325115]
This paper examines the critical challenges and potential solutions for conducting secure and effective external evaluations of general-purpose AI (GPAI) models.<n>With the exponential growth in size, capability, reach and accompanying risk, ensuring accountability, safety, and public trust requires frameworks that go beyond traditional black-box methods.
arXiv Detail & Related papers (2025-03-10T16:13:45Z) - On Benchmarking Human-Like Intelligence in Machines [77.55118048492021]
We argue that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities.<n>We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability and uncertainty, and reliance on simplified and ecologically-invalid tasks.
arXiv Detail & Related papers (2025-02-27T20:21:36Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Where Assessment Validation and Responsible AI Meet [0.0876953078294908]
We propose a unified assessment framework that considers classical test validation theory and assessment-specific and domain-agnostic RAI principles and practice.
The framework addresses responsible AI use for assessment that supports validity arguments, alignment with AI ethics to maintain human values and oversight, and broader social responsibility associated with AI use.
arXiv Detail & Related papers (2024-11-04T20:20:29Z) - Measuring Value Alignment [12.696227679697493]
This paper introduces a novel formalism to quantify the alignment between AI systems and human values.
By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
arXiv Detail & Related papers (2023-12-23T12:30:06Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z) - A Framework for Ethical AI at the United Nations [0.0]
This paper aims to provide an overview of the ethical concerns in artificial intelligence (AI) and the framework that is needed to mitigate those risks.
It suggests a practical path to ensure the development and use of AI at the United Nations (UN) aligns with our ethical values.
arXiv Detail & Related papers (2021-04-09T23:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.