Related papers: How should AI Safety Benchmarks Benchmark Safety?

How should AI Safety Benchmarks Benchmark Safety?

URL: http://arxiv.org/abs/2601.23112v1
Date: Fri, 30 Jan 2026 15:58:59 GMT
Title: How should AI Safety Benchmarks Benchmark Safety?
Authors: Cheng Yu, Severin Engelmann, Ruoxuan Cao, Dalia Ali, Orestis Papakyriakopoulos,
Abstract summary: We present a review of 210 safety benchmarks that maps out common challenges in safety benchmarking.<n>We argue that adhering to established risk management principles can significantly improve the validity and usefulness of AI safety benchmarks.
Score: 10.00492155071077
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI safety benchmarks are pivotal for safety in advanced AI systems; however, they have significant technical, epistemic, and sociotechnical shortcomings. We present a review of 210 safety benchmarks that maps out common challenges in safety benchmarking, documenting failures and limitations by drawing from engineering sciences and long-established theories of risk and safety. We argue that adhering to established risk management principles, mapping the space of what can(not) be measured, developing robust probabilistic metrics, and efficiently deploying measurement theory to connect benchmarking objectives with the world can significantly improve the validity and usefulness of AI safety benchmarks. The review provides a roadmap on how to improve AI safety benchmarking, and we illustrate the effectiveness of these recommendations through quantitative and qualitative evaluation. We also introduce a checklist that can help researchers and practitioners develop robust and epistemologically sound safety benchmarks. This study advances the science of benchmarking and helps practitioners deploy AI systems more responsibly.

Related papers

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond [134.43113804188195]
We introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts.<n>SafeSci comprises SafeSciBench, a multi-disciplinary benchmark with 0.25M samples, and SafeSciTrain, a large-scale dataset containing 1.5M samples for safety enhancement.
arXiv Detail & Related papers (2026-03-02T08:16:04Z)
ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI [38.70363180741332]
"ForesightSafety Bench" is a safety evaluation framework for cutting-edge AI models.<n>The benchmark has accumulated tens of thousands of structured risk data points and assessment results.<n>Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models.
arXiv Detail & Related papers (2026-02-15T13:12:44Z)
The Necessity of a Holistic Safety Evaluation Framework for AI-Based Automation Features [0.0]
Safety of Intended Functionality (SOTIF) and Functional Safety (FuSa) analysis of driving automation features has traditionally excluded Quality Management (QM) components from rigorous safety impact evaluations.<n>Recent developments in artificial intelligence (AI) integration reveal that such components can contribute to SOTIF-related hazardous risks.<n>This paper argues for the adoption of comprehensive FuSa, SOTIF, and AI standards-driven methodologies to identify and mitigate risks in AI components.
arXiv Detail & Related papers (2026-02-05T00:22:24Z)
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods [0.0]
This literature review consolidates the rapidly evolving field of AI safety evaluations.<n>It proposes a systematic taxonomy around three dimensions: what properties we measure, how we measure them, and how these measurements integrate into frameworks.
arXiv Detail & Related papers (2025-05-08T16:55:07Z)
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement [73.0700818105842]
We introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety.<n> AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques.<n>We conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness.
arXiv Detail & Related papers (2025-02-24T02:11:52Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation [2.2241228857601727]
This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices.<n>It brings together many fine-grained issues in the design and application of benchmarks with broader sociotechnical issues.<n>Our review also highlights a series of systemic flaws in current practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results.
arXiv Detail & Related papers (2025-02-10T15:25:06Z)
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior [56.10557932893919]
We present SafetyAnalyst, a novel AI safety moderation framework.<n>Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences.<n>It aggregates effects into a harmfulness score using 28 fully interpretable weight parameters.
arXiv Detail & Related papers (2024-10-22T03:38:37Z)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.<n>We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.