GPAI Evaluations Standards Taskforce: Towards Effective AI Governance
- URL: http://arxiv.org/abs/2411.13808v1
- Date: Thu, 21 Nov 2024 03:14:31 GMT
- Title: GPAI Evaluations Standards Taskforce: Towards Effective AI Governance
- Authors: Patricia Paskov, Lukas Berglund, Everett Smith, Lisa Soder,
- Abstract summary: General-purpose AI evaluations have been proposed as a promising way of identifying and mitigating systemic risks posed by AI development and deployment.
No standards exist to date to promote their quality or legitimacy.
We propose an EU GPAI Evaluation Standards Taskforce to be housed within the bodies established by the EU AI Act.
- Score: 0.0
- License:
- Abstract: General-purpose AI evaluations have been proposed as a promising way of identifying and mitigating systemic risks posed by AI development and deployment. While GPAI evaluations play an increasingly central role in institutional decision- and policy-making -- including by way of the European Union AI Act's mandate to conduct evaluations on GPAI models presenting systemic risk -- no standards exist to date to promote their quality or legitimacy. To strengthen GPAI evaluations in the EU, which currently constitutes the first and only jurisdiction that mandates GPAI evaluations, we outline four desiderata for GPAI evaluations: internal validity, external validity, reproducibility, and portability. To uphold these desiderata in a dynamic environment of continuously evolving risks, we propose a dedicated EU GPAI Evaluation Standards Taskforce, to be housed within the bodies established by the EU AI Act. We outline the responsibilities of the Taskforce, specify the GPAI provider commitments that would facilitate Taskforce success, discuss the potential impact of the Taskforce on global AI governance, and address potential sources of failure that policymakers should heed.
Related papers
- Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation [2.07180164747172]
We argue that regulation should require developers to explicitly identify and justify key underlying assumptions about evaluations.
We identify core assumptions in AI evaluations, such as comprehensive threat modeling, proxy task validity, and adequate capability elicitation.
Our presented approach aims to enhance transparency in AI development, offering a practical path towards more effective governance of advanced AI systems.
arXiv Detail & Related papers (2024-11-19T19:13:56Z) - The Fundamental Rights Impact Assessment (FRIA) in the AI Act: Roots, legal obligations and key elements for a model template [55.2480439325792]
Article aims to fill existing gaps in the theoretical and methodological elaboration of the Fundamental Rights Impact Assessment (FRIA)
This article outlines the main building blocks of a model template for the FRIA.
It can serve as a blueprint for other national and international regulatory initiatives to ensure that AI is fully consistent with human rights.
arXiv Detail & Related papers (2024-11-07T11:55:55Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - How Could Generative AI Support Compliance with the EU AI Act? A Review for Safe Automated Driving Perception [4.075971633195745]
Deep Neural Networks (DNNs) have become central for the perception functions of autonomous vehicles.
The European Union (EU) Artificial Intelligence (AI) Act aims to address these challenges by establishing stringent norms and standards for AI systems.
This review paper summarizes the requirements arising from the EU AI Act regarding DNN-based perception systems and systematically categorizes existing generative AI applications in AD.
arXiv Detail & Related papers (2024-08-30T12:01:06Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.
Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.
However, the deployment of these agents in physical environments presents significant safety challenges.
This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Responsible AI Question Bank: A Comprehensive Tool for AI Risk Assessment [18.966590454042272]
The study introduces our Responsible AI (RAI) Question Bank, a comprehensive framework and tool designed to support diverse AI initiatives.
By integrating AI ethics principles such as fairness, transparency, and accountability into a structured question format, the RAI Question Bank aids in identifying potential risks.
arXiv Detail & Related papers (2024-08-02T22:40:20Z) - Navigating the EU AI Act: A Methodological Approach to Compliance for Safety-critical Products [0.0]
This paper presents a methodology for interpreting the EU AI Act requirements for high-risk AI systems.
We first propose an extended product quality model for AI systems, incorporating attributes relevant to the Act not covered by current quality models.
We then propose a contract-based approach to derive technical requirements at the stakeholder level.
arXiv Detail & Related papers (2024-03-25T14:32:18Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - The risks of risk-based AI regulation: taking liability seriously [46.90451304069951]
The development and regulation of AI seems to have reached a critical stage.
Some experts are calling for a moratorium on the training of AI systems more powerful than GPT-4.
This paper analyses the most advanced legal proposal, the European Union's AI Act.
arXiv Detail & Related papers (2023-11-03T12:51:37Z) - An International Consortium for Evaluations of Societal-Scale Risks from
Advanced AI [10.550015825854837]
A regulatory gap has permitted AI labs to conduct research, development, and deployment activities with minimal oversight.
frontier AI system evaluations have been proposed as a way of assessing risks from the development and deployment of frontier AI systems.
This paper proposes a solution in the form of an international consortium for AI risk evaluations, comprising both AI developers and third-party AI risk evaluators.
arXiv Detail & Related papers (2023-10-22T23:37:48Z) - International Institutions for Advanced AI [47.449762587672986]
International institutions may have an important role to play in ensuring advanced AI systems benefit humanity.
This paper identifies a set of governance functions that could be performed at an international level to address these challenges.
It groups these functions into four institutional models that exhibit internal synergies and have precedents in existing organizations.
arXiv Detail & Related papers (2023-07-10T16:55:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.