GPAI Evaluations Standards Taskforce: Towards Effective AI Governance
- URL: http://arxiv.org/abs/2411.13808v1
- Date: Thu, 21 Nov 2024 03:14:31 GMT
- Title: GPAI Evaluations Standards Taskforce: Towards Effective AI Governance
- Authors: Patricia Paskov, Lukas Berglund, Everett Smith, Lisa Soder,
- Abstract summary: General-purpose AI evaluations have been proposed as a promising way of identifying and mitigating systemic risks posed by AI development and deployment.
No standards exist to date to promote their quality or legitimacy.
We propose an EU GPAI Evaluation Standards Taskforce to be housed within the bodies established by the EU AI Act.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: General-purpose AI evaluations have been proposed as a promising way of identifying and mitigating systemic risks posed by AI development and deployment. While GPAI evaluations play an increasingly central role in institutional decision- and policy-making -- including by way of the European Union AI Act's mandate to conduct evaluations on GPAI models presenting systemic risk -- no standards exist to date to promote their quality or legitimacy. To strengthen GPAI evaluations in the EU, which currently constitutes the first and only jurisdiction that mandates GPAI evaluations, we outline four desiderata for GPAI evaluations: internal validity, external validity, reproducibility, and portability. To uphold these desiderata in a dynamic environment of continuously evolving risks, we propose a dedicated EU GPAI Evaluation Standards Taskforce, to be housed within the bodies established by the EU AI Act. We outline the responsibilities of the Taskforce, specify the GPAI provider commitments that would facilitate Taskforce success, discuss the potential impact of the Taskforce on global AI governance, and address potential sources of failure that policymakers should heed.
Related papers
- Towards an Approach for Evaluating the Impact of AI Standards [0.0]
The goal of AI standards is to promote innovation and public trust in systems that use AI.<n>There is a lack of a formal or shared method to measure the impact of these standardization activities on the goals of innovation and trust.<n>This concept paper proposes an analytical approach that could inform the evaluation of the impact of AI standards.
arXiv Detail & Related papers (2025-06-16T13:58:59Z) - Enhancing Trust Through Standards: A Comparative Risk-Impact Framework for Aligning ISO AI Standards with Global Ethical and Regulatory Contexts [0.0]
ISO standards aim to foster responsible development by embedding fairness, transparency, and risk management into AI systems.
Their effectiveness varies across diverse regulatory landscapes, from the EU's risk-based AI Act to China's stability-focused measures.
This paper introduces a novel Comparative Risk-Impact Assessment Framework to evaluate how well ISO standards address ethical risks.
arXiv Detail & Related papers (2025-04-22T00:44:20Z) - HH4AI: A methodological Framework for AI Human Rights impact assessment under the EUAI ACT [1.7754875105502606]
The paper highlights AIs transformative nature, driven by autonomy, data, and goal-oriented design.
A key challenge is defining and assessing "high-risk" AI systems across industries.
It proposes a Fundamental Rights Impact Assessment (FRIA) methodology, a gate-based framework designed to isolate and assess risks.
arXiv Detail & Related papers (2025-03-23T19:10:14Z) - In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI [93.33036653316591]
We call for three interventions to advance system safety.
First, we propose using standardized AI flaw reports and rules of engagement for researchers.
Second, we propose GPAI system providers adopt broadly-scoped flaw disclosure programs.
Third, we advocate for the development of improved infrastructure to coordinate distribution of flaw reports.
arXiv Detail & Related papers (2025-03-21T05:09:46Z) - Securing External Deeper-than-black-box GPAI Evaluations [49.1574468325115]
This paper examines the critical challenges and potential solutions for conducting secure and effective external evaluations of general-purpose AI (GPAI) models.
With the exponential growth in size, capability, reach and accompanying risk, ensuring accountability, safety, and public trust requires frameworks that go beyond traditional black-box methods.
arXiv Detail & Related papers (2025-03-10T16:13:45Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.
The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - The Fundamental Rights Impact Assessment (FRIA) in the AI Act: Roots, legal obligations and key elements for a model template [55.2480439325792]
Article aims to fill existing gaps in the theoretical and methodological elaboration of the Fundamental Rights Impact Assessment (FRIA)
This article outlines the main building blocks of a model template for the FRIA.
It can serve as a blueprint for other national and international regulatory initiatives to ensure that AI is fully consistent with human rights.
arXiv Detail & Related papers (2024-11-07T11:55:55Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - How Could Generative AI Support Compliance with the EU AI Act? A Review for Safe Automated Driving Perception [4.075971633195745]
Deep Neural Networks (DNNs) have become central for the perception functions of autonomous vehicles.
The European Union (EU) Artificial Intelligence (AI) Act aims to address these challenges by establishing stringent norms and standards for AI systems.
This review paper summarizes the requirements arising from the EU AI Act regarding DNN-based perception systems and systematically categorizes existing generative AI applications in AD.
arXiv Detail & Related papers (2024-08-30T12:01:06Z) - Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment [17.026921603767722]
The study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives.
By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks.
arXiv Detail & Related papers (2024-08-02T22:40:20Z) - An evidence-based methodology for human rights impact assessment (HRIA) in the development of AI data-intensive systems [49.1574468325115]
We show that human rights already underpin the decisions in the field of data use.
This work presents a methodology and a model for a Human Rights Impact Assessment (HRIA)
The proposed methodology is tested in concrete case-studies to prove its feasibility and effectiveness.
arXiv Detail & Related papers (2024-07-30T16:27:52Z) - Navigating the EU AI Act: A Methodological Approach to Compliance for Safety-critical Products [0.0]
This paper presents a methodology for interpreting the EU AI Act requirements for high-risk AI systems.
We first propose an extended product quality model for AI systems, incorporating attributes relevant to the Act not covered by current quality models.
We then propose a contract-based approach to derive technical requirements at the stakeholder level.
arXiv Detail & Related papers (2024-03-25T14:32:18Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - The risks of risk-based AI regulation: taking liability seriously [46.90451304069951]
The development and regulation of AI seems to have reached a critical stage.
Some experts are calling for a moratorium on the training of AI systems more powerful than GPT-4.
This paper analyses the most advanced legal proposal, the European Union's AI Act.
arXiv Detail & Related papers (2023-11-03T12:51:37Z) - An International Consortium for Evaluations of Societal-Scale Risks from
Advanced AI [10.550015825854837]
A regulatory gap has permitted AI labs to conduct research, development, and deployment activities with minimal oversight.
frontier AI system evaluations have been proposed as a way of assessing risks from the development and deployment of frontier AI systems.
This paper proposes a solution in the form of an international consortium for AI risk evaluations, comprising both AI developers and third-party AI risk evaluators.
arXiv Detail & Related papers (2023-10-22T23:37:48Z) - International Institutions for Advanced AI [47.449762587672986]
International institutions may have an important role to play in ensuring advanced AI systems benefit humanity.
This paper identifies a set of governance functions that could be performed at an international level to address these challenges.
It groups these functions into four institutional models that exhibit internal synergies and have precedents in existing organizations.
arXiv Detail & Related papers (2023-07-10T16:55:55Z) - Quantitative AI Risk Assessments: Opportunities and Challenges [9.262092738841979]
AI-based systems are increasingly being leveraged to provide value to organizations, individuals, and society.
Risks have led to proposed regulations, litigation, and general societal concerns.
This paper explores the concept of a quantitative AI Risk Assessment.
arXiv Detail & Related papers (2022-09-13T21:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.