Related papers: On the relationship between Benchmarking, Standards and Certification in Robotics and AI

Related papers

Deprecating Benchmarks: Criteria and Framework [2.6449913368815516]
We propose criteria to decide when to fully or partially deprecate benchmarks, and a framework for deprecating benchmarks.<n>Our work aims to advance the state of benchmarking towards rigorous and quality evaluations, especially for frontier models.
arXiv Detail & Related papers (2025-07-08T22:29:06Z)
Establishing Best Practices for Building Rigorous Agentic Benchmarks [94.69724201080155]
We show that many agentic benchmarks have issues in task setup or reward design.<n>Such issues can lead to under- or overestimation of agents' performance by up to 100% in relative terms.<n>We introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience.
arXiv Detail & Related papers (2025-07-03T17:35:31Z)
Standardizing Intelligence: Aligning Generative AI for Regulatory and Operational Compliance [3.666326242924816]
We assess the criticality levels of different standards across domains and sectors and complement them by grading the current compliance capabilities of state-of-the-art GenAI models. Overall, we argue that aligning GenAI with standards through computational methods can help strengthen regulatory and operational compliance.
arXiv Detail & Related papers (2025-02-03T16:55:01Z)
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [47.11178028457252]
We develop SecCodePLT, a unified and comprehensive evaluation platform for code GenAIs' risks. For insecure code, we introduce a new methodology for data creation that combines experts with automatic generation. For cyberattack helpfulness, we construct samples to prompt a model to generate actual attacks, along with dynamic metrics in our environment.
arXiv Detail & Related papers (2024-10-14T21:17:22Z)
Ethical and Scalable Automation: A Governance and Compliance Framework for Business Applications [0.0]
This paper introduces a framework ensuring that AI must be ethical, controllable, viable, and desirable. Different case studies validate this framework by integrating AI in both academic and practical environments.
arXiv Detail & Related papers (2024-09-25T12:39:28Z)
An Open Knowledge Graph-Based Approach for Mapping Concepts and Requirements between the EU AI Act and International Standards [1.9142148274342772]
The EU's AI Act will shift the focus of such organizations toward conformance with the technical requirements for regulatory compliance. This paper offers a simple and repeatable mechanism for mapping the terms and requirements relevant to normative statements in regulations and standards.
arXiv Detail & Related papers (2024-08-21T18:21:09Z)
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed. In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness. We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z)
Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment. To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z)
ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. Each module requires benchmark designers to describe, justify, and support benchmark design choices. Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z)
Towards Standards-Compliant Assistive Technology Product Specifications via LLMs [7.30389619012625]
We introduce CompliAT, a pioneering framework designed to streamline the compliance process of AT product specifications. CompliAT addresses three critical tasks: checking consistency terminology, classifying products according to standards, and tracing key product specifications to standard requirements. We propose a novel approach for product classification, leveraging a retrieval-augmented generation model to accurately categorize AT products aligning to international standards.
arXiv Detail & Related papers (2024-04-04T00:10:39Z)
No Trust without regulation! [0.0]
The explosion in performance of Machine Learning (ML) and the potential of its applications are encouraging us to consider its use in industrial systems. It is still leaving too much to one side the issue of safety and its corollary, regulation and standards. The European Commission has laid the foundations for moving forward and building solid approaches to the integration of AI-based applications that are safe, trustworthy and respect European ethical values.
arXiv Detail & Related papers (2023-09-27T09:08:41Z)
A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates. We present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z)
Towards a multi-stakeholder value-based assessment framework for algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values. We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z)
A Norm Emergence Framework for Normative MAS -- Position Paper [0.90238471756546]
We propose a framework for the emergence of norms within a normative multiagent system. We make the case that, similarly, a norm has emerged in a normative MAS when a percentage of agents adopt the norm. We put forward a framework for the emergence of norms within a normative MAS, while special-purpose synthesizer agents formulate new norms or revisions in response to these requests.
arXiv Detail & Related papers (2020-04-06T11:42:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.