On the relationship between Benchmarking, Standards and Certification in
Robotics and AI
- URL: http://arxiv.org/abs/2309.12139v1
- Date: Thu, 21 Sep 2023 14:59:36 GMT
- Title: On the relationship between Benchmarking, Standards and Certification in
Robotics and AI
- Authors: Alan F.T. Winfield and Matthew Studley
- Abstract summary: Benchmarking, standards and certification are closely related processes.
Benchmarking, standards and certification are not only useful but vital to the broader practice of Responsible Innovation.
- Score: 1.1421942894219899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Benchmarking, standards and certification are closely related processes.
Standards can provide normative requirements that robotics and AI systems may
or may not conform to. Certification generally relies upon conformance with one
or more standards as the key determinant of granting a certificate to operate.
And benchmarks are sets of standardised tests against which robots and AI
systems can be measured. Benchmarks therefore can be thought of as informal
standards. In this paper we will develop these themes with examples from
benchmarking, standards and certification, and argue that these three linked
processes are not only useful but vital to the broader practice of Responsible
Innovation.
Related papers
- Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed.
In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness.
We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - ECBD: Evidence-Centered Benchmark Design for NLP [95.50252564938417]
We propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules.
Each module requires benchmark designers to describe, justify, and support benchmark design choices.
Our analysis reveals common trends in benchmark design and documentation that could threaten the validity of benchmarks' measurements.
arXiv Detail & Related papers (2024-06-13T00:59:55Z) - Towards Standards-Compliant Assistive Technology Product Specifications via LLMs [7.30389619012625]
We introduce CompliAT, a pioneering framework designed to streamline the compliance process of AT product specifications.
CompliAT addresses three critical tasks: checking consistency terminology, classifying products according to standards, and tracing key product specifications to standard requirements.
We propose a novel approach for product classification, leveraging a retrieval-augmented generation model to accurately categorize AT products aligning to international standards.
arXiv Detail & Related papers (2024-04-04T00:10:39Z) - How to Prune Your Language Model: Recovering Accuracy on the "Sparsity
May Cry'' Benchmark [60.72725673114168]
We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets.
We propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark.
arXiv Detail & Related papers (2023-12-21T03:11:30Z) - No Trust without regulation! [0.0]
The explosion in performance of Machine Learning (ML) and the potential of its applications are encouraging us to consider its use in industrial systems.
It is still leaving too much to one side the issue of safety and its corollary, regulation and standards.
The European Commission has laid the foundations for moving forward and building solid approaches to the integration of AI-based applications that are safe, trustworthy and respect European ethical values.
arXiv Detail & Related papers (2023-09-27T09:08:41Z) - A General Verification Framework for Dynamical and Control Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates.
We present an automated approach to formally synthesise controllers and certificates.
Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - AI Certification: Advancing Ethical Practice by Reducing Information
Asymmetries [0.0]
This paper draws from management literature on certification and reviews current AI certification programs and proposals.
The review indicates that the field currently focuses on self-certification and third-party certification of systems, individuals, and organizations.
arXiv Detail & Related papers (2021-05-20T08:27:29Z) - A Norm Emergence Framework for Normative MAS -- Position Paper [0.90238471756546]
We propose a framework for the emergence of norms within a normative multiagent system.
We make the case that, similarly, a norm has emerged in a normative MAS when a percentage of agents adopt the norm.
We put forward a framework for the emergence of norms within a normative MAS, while special-purpose synthesizer agents formulate new norms or revisions in response to these requests.
arXiv Detail & Related papers (2020-04-06T11:42:01Z) - AIBench: An Agile Domain-specific Benchmarking Methodology and an AI
Benchmark Suite [26.820244556465333]
This paper proposes an agile domain-specific benchmarking methodology.
We identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks.
We present the first end-to-end Internet service AI benchmark.
arXiv Detail & Related papers (2020-02-17T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.