AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
- URL: http://arxiv.org/abs/2407.17436v2
- Date: Mon, 5 Aug 2024 18:12:27 GMT
- Title: AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies
- Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li,
- Abstract summary: AIR-Bench 2024 is the first AI safety benchmark aligned with emerging government regulations and company policies.
It decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with granular risk categories in the lowest tier.
We evaluate leading language models on AIR-Bench 2024, uncovering insights into their alignment with specified safety concerns.
- Score: 80.90138009539004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in recent regulations and policies, which makes it challenging to evaluate and compare FMs across these benchmarks. To bridge this gap, we introduce AIR-Bench 2024, the first AI safety benchmark aligned with emerging government regulations and company policies, following the regulation-based safety categories grounded in our AI risks study, AIR 2024. AIR 2024 decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with 314 granular risk categories in the lowest tier. AIR-Bench 2024 contains 5,694 diverse prompts spanning these categories, with manual curation and human auditing to ensure quality. We evaluate leading language models on AIR-Bench 2024, uncovering insights into their alignment with specified safety concerns. By bridging the gap between public benchmarks and practical AI risks, AIR-Bench 2024 provides a foundation for assessing model safety across jurisdictions, fostering the development of safer and more responsible AI systems.
Related papers
- Auction-Based Regulation for Artificial Intelligence [28.86995747151915]
We propose an auction-based regulatory mechanism to regulate AI safety.
We provably guarantee that each participating agent's best strategy is to submit a model safer than a prescribed minimum-safety threshold.
Empirical results show that our regulatory auction boosts safety and participation rates by 20% and 15% respectively.
arXiv Detail & Related papers (2024-10-02T17:57:02Z) - A Grading Rubric for AI Safety Frameworks [1.053373860696675]
Major players like Anthropic, OpenAI, and Google DeepMind have already published their AI safety frameworks.
This paper proposes a grading rubric to enable governments, academia, and civil society to pass judgment on these frameworks.
arXiv Detail & Related papers (2024-09-13T12:01:55Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.
We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies [88.32153122712478]
We identify 314 unique risk categories organized into a four-tiered taxonomy.
At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks.
We aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.
arXiv Detail & Related papers (2024-06-25T18:13:05Z) - Introducing v0.5 of the AI Safety Benchmark from MLCommons [101.98401637778638]
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group.
The benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models.
arXiv Detail & Related papers (2024-04-18T15:01:00Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk
Disclosures [0.8702432681310399]
We propose a risk profiling standard which can guide downstream decision-making.
The standard is built on our proposed taxonomy of AI risks, which reflects a high-level categorization of the wide variety of risks proposed in the literature.
We apply this methodology to a number of prominent AI systems using publicly available information.
arXiv Detail & Related papers (2023-09-22T20:45:15Z) - Foveate, Attribute, and Rationalize: Towards Physically Safe and
Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful.
We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety.
Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z) - Assurance Cases as Foundation Stone for Auditing AI-enabled and
Autonomous Systems: Workshop Results and Political Recommendations for Action
from the ExamAI Project [2.741266294612776]
We investigate the way safety standards define safety measures to be implemented against software faults.
Functional safety standards use Safety Integrity Levels (SILs) to define which safety measures shall be implemented.
We propose the use of assurance cases to argue that the individually selected and applied measures are sufficient.
arXiv Detail & Related papers (2022-08-17T10:05:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.