Sociotechnical Safety Evaluation of Generative AI Systems
- URL: http://arxiv.org/abs/2310.11986v2
- Date: Tue, 31 Oct 2023 18:23:32 GMT
- Title: Sociotechnical Safety Evaluation of Generative AI Systems
- Authors: Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa
Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor
Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, William Isaac
- Abstract summary: Generative AI systems produce a range of risks.
To ensure the safety of generative AI systems, these risks must be evaluated.
We propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.
- Score: 13.546708226350963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative AI systems produce a range of risks. To ensure the safety of
generative AI systems, these risks must be evaluated. In this paper, we make
two main contributions toward establishing such evaluations. First, we propose
a three-layered framework that takes a structured, sociotechnical approach to
evaluating these risks. This framework encompasses capability evaluations,
which are the main current approach to safety evaluation. It then reaches
further by building on system safety principles, particularly the insight that
context determines whether a given capability may cause harm. To account for
relevant context, our framework adds human interaction and systemic impacts as
additional layers of evaluation. Second, we survey the current state of safety
evaluation of generative AI systems and create a repository of existing
evaluations. Three salient evaluation gaps emerge from this analysis. We
propose ways forward to closing these gaps, outlining practical steps as well
as roles and responsibilities for different actors. Sociotechnical safety
evaluation is a tractable approach to the robust and comprehensive safety
evaluation of generative AI systems.
Related papers
- EAIRiskBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [47.69642609574771]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.
Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.
However, the deployment of these agents in physical environments presents significant safety challenges.
This study introduces EAIRiskBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Evaluating AI Evaluation: Perils and Prospects [8.086002368038658]
This paper contends that the prevalent evaluation methods for these systems are fundamentally inadequate.
I argue that a reformation is required in the way we evaluate AI systems and that we should look towards cognitive sciences for inspiration.
arXiv Detail & Related papers (2024-07-12T12:37:13Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Holistic Safety and Responsibility Evaluations of Advanced AI Models [18.34510620901674]
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice.
In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation.
arXiv Detail & Related papers (2024-04-22T10:26:49Z) - Leveraging Traceability to Integrate Safety Analysis Artifacts into the
Software Development Process [51.42800587382228]
Safety assurance cases (SACs) can be challenging to maintain during system evolution.
We propose a solution that leverages software traceability to connect relevant system artifacts to safety analysis models.
We elicit design rationales for system changes to help safety stakeholders analyze the impact of system changes on safety.
arXiv Detail & Related papers (2023-07-14T16:03:27Z) - Model evaluation for extreme risks [46.53170857607407]
Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills.
We explain why model evaluation is critical for addressing extreme risks.
arXiv Detail & Related papers (2023-05-24T16:38:43Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Quantitative AI Risk Assessments: Opportunities and Challenges [9.262092738841979]
AI-based systems are increasingly being leveraged to provide value to organizations, individuals, and society.
Risks have led to proposed regulations, litigation, and general societal concerns.
This paper explores the concept of a quantitative AI Risk Assessment.
arXiv Detail & Related papers (2022-09-13T21:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.