Related papers: COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

URL: http://arxiv.org/abs/2410.07959v1
Date: Thu, 10 Oct 2024 14:23:51 GMT
Title: COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Authors: Philipp Guldimann, Alexander Spiridonov, Robin Staab, Nikola Jovanović, Mark Vero, Velko Vechev, Anna Gueorguieva, Mislav Balunović, Nikola Konstantinov, Pavol Bielik, Petar Tsankov, Martin Vechev,
Abstract summary: The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development. It lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of the first technical interpretation of the Act.
Score: 40.233017376716305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development, but lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of (i) the first technical interpretation of the EU AI Act, translating its broad regulatory requirements into measurable technical requirements, with the focus on large language models (LLMs), and (ii) an open-source Act-centered benchmarking suite, based on thorough surveying and implementation of state-of-the-art LLM benchmarks. By evaluating 12 prominent LLMs in the context of COMPL-AI, we reveal shortcomings in existing models and benchmarks, particularly in areas like robustness, safety, diversity, and fairness. This work highlights the need for a shift in focus towards these aspects, encouraging balanced development of LLMs and more comprehensive regulation-aligned benchmarks. Simultaneously, COMPL-AI for the first time demonstrates the possibilities and difficulties of bringing the Act's obligations to a more concrete, technical level. As such, our work can serve as a useful first step towards having actionable recommendations for model providers, and contributes to ongoing efforts of the EU to enable application of the Act, such as the drafting of the GPAI Code of Practice.

Related papers

How Good are Foundation Models in Step-by-Step Embodied Reasoning? [79.15268080287505]
Embodied agents must make decisions that are safe, spatially coherent, and grounded in context.<n>Recent advances in large multimodal models have shown promising capabilities in visual understanding and language generation.<n>Our benchmark includes over 1.1k samples with detailed step-by-step reasoning across 10 tasks and 8 embodiments.
arXiv Detail & Related papers (2025-09-18T17:56:30Z)
Engineering the Law-Machine Learning Translation Problem: Developing Legally Aligned Models [0.0]
We introduce a five-stage interdisciplinary framework that integrates legal and ML-technical analysis during machine learning model development. This framework facilitates designing ML models in a legally aligned way and identifying high-performing models that are legally justifiable.
arXiv Detail & Related papers (2025-04-23T13:41:17Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)<n>This paper explores potential areas where statisticians can make important contributions to the development of LLMs.<n>We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
Robustness and Cybersecurity in the EU Artificial Intelligence Act [1.433758865948252]
The EU Artificial Intelligence Act (AIA) establishes different legal principles for different types of AI systems. While prior work has sought to clarify some of these principles, little attention has been paid to robustness and cybersecurity. We identify legal challenges and shortcomings in provisions related to robustness and cybersecurity for high-risk AI systems.
arXiv Detail & Related papers (2025-02-22T11:12:20Z)
LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning. TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z)
The Fundamental Rights Impact Assessment (FRIA) in the AI Act: Roots, legal obligations and key elements for a model template [55.2480439325792]
Article aims to fill existing gaps in the theoretical and methodological elaboration of the Fundamental Rights Impact Assessment (FRIA) This article outlines the main building blocks of a model template for the FRIA. It can serve as a blueprint for other national and international regulatory initiatives to ensure that AI is fully consistent with human rights.
arXiv Detail & Related papers (2024-11-07T11:55:55Z)
A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models [0.0]
We propose a comprehensive approach to benchmark development based on rigorous psychometric principles. We make the first attempt to illustrate this approach by creating a new benchmark in the field of pedagogy and education. We construct a novel benchmark guided by the Bloom's taxonomy and rigorously designed by a consortium of education experts trained in test development.
arXiv Detail & Related papers (2024-10-29T19:32:43Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Knowledge-Augmented Reasoning for EUAIA Compliance and Adversarial Robustness of LLMs [1.368472250332885]
The EU AI Act (EUAIA) introduces requirements for AI systems which intersect with the processes required to establish adversarial robustness. This paper presents a functional architecture that focuses on bridging the two properties. We aim to support developers and auditors with a reasoning layer based on knowledge augmentation.
arXiv Detail & Related papers (2024-10-04T18:23:14Z)
The Impossibility of Fair LLMs [59.424918263776284]
The need for fair AI is increasingly clear in the era of large language models (LLMs) We review the technical frameworks that machine learning researchers have used to evaluate fairness. We develop guidelines for the more realistic goal of achieving fairness in particular use cases.
arXiv Detail & Related papers (2024-05-28T04:36:15Z)
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z)
Navigating the EU AI Act: A Methodological Approach to Compliance for Safety-critical Products [0.0]
This paper presents a methodology for interpreting the EU AI Act requirements for high-risk AI systems. We first propose an extended product quality model for AI systems, incorporating attributes relevant to the Act not covered by current quality models. We then propose a contract-based approach to derive technical requirements at the stakeholder level.
arXiv Detail & Related papers (2024-03-25T14:32:18Z)
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence [5.147767778946168]
We critically assess 23 state-of-the-art Large Language Models (LLMs) benchmarks. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, diversity, and the overlooking of cultural and ideological norms.
arXiv Detail & Related papers (2024-02-15T11:08:10Z)
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [76.95062553043607]
evaluating large language models (LLMs) is essential for understanding their capabilities and facilitating their integration into practical applications. We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.
arXiv Detail & Related papers (2024-01-24T01:51:00Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Auditing large language models: a three-layered approach [0.0]
Large language models (LLMs) represent a major advance in artificial intelligence (AI) research. LLMs are also coupled with significant ethical and social challenges. Previous research has pointed towards auditing as a promising governance mechanism.
arXiv Detail & Related papers (2023-02-16T18:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.