Related papers: Holistic Safety and Responsibility Evaluations of Advanced AI Models

Holistic Safety and Responsibility Evaluations of Advanced AI Models

URL: http://arxiv.org/abs/2404.14068v1
Date: Mon, 22 Apr 2024 10:26:49 GMT
Title: Holistic Safety and Responsibility Evaluations of Advanced AI Models
Authors: Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac,
Abstract summary: Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation.
Score: 18.34510620901674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety - including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem.

Related papers

Accountability of Robust and Reliable AI-Enabled Systems: A Preliminary Study and Roadmap [1.8816378259778017]
This vision paper presents initial research on assessing the robustness and reliability of AI-enabled systems.<n>By exploring evolving definitions of these concepts and reviewing current literature, the study highlights major challenges and approaches in the field.<n>The incorporation of accountability is crucial for building trust and ensuring responsible AI development.
arXiv Detail & Related papers (2025-06-20T08:35:11Z)
The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs [42.57873562187369]
Large Language Models (LLMs) have demonstrated remarkable potential in the field of Natural Language Processing (NLP)<n>LLMs have occasionally exhibited unsafe elements like toxicity and bias, particularly in adversarial scenarios.<n>This survey aims to provide a comprehensive and systematic overview of recent advancements in LLMs safety evaluation.
arXiv Detail & Related papers (2025-06-06T05:50:50Z)
What Makes an Evaluation Useful? Common Pitfalls and Best Practices [3.4740704830599385]
We discuss the steps of the initial thought process, which connects threat modeling to evaluation design. We provide the characteristics and parameters that make an evaluation useful.
arXiv Detail & Related papers (2025-03-30T12:51:47Z)
Securing External Deeper-than-black-box GPAI Evaluations [49.1574468325115]
This paper examines the critical challenges and potential solutions for conducting secure and effective external evaluations of general-purpose AI (GPAI) models. With the exponential growth in size, capability, reach and accompanying risk, ensuring accountability, safety, and public trust requires frameworks that go beyond traditional black-box methods.
arXiv Detail & Related papers (2025-03-10T16:13:45Z)
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement [73.0700818105842]
We introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety. AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques. We conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness.
arXiv Detail & Related papers (2025-02-24T02:11:52Z)
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [333.9220561243189]
Generative Foundation Models (GenFMs) have emerged as transformative tools. Their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs. We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z)
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach [58.93030774141753]
Multimodal foundation models (MFMs) represent a significant advancement in artificial intelligence. This paper conceptualizes cybersafety and cybersecurity in the context of multimodal learning. We present a comprehensive Systematization of Knowledge (SoK) to unify these concepts in MFMs, identifying key threats.
arXiv Detail & Related papers (2024-11-17T23:06:20Z)
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context. We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z)
Evaluating Human-AI Collaboration: A Review and Methodological Framework [4.41358655687435]
The use of artificial intelligence (AI) in working environments with individuals, known as Human-AI Collaboration (HAIC), has become essential. evaluating HAIC's effectiveness remains challenging due to the complex interaction of components involved. This paper provides a detailed analysis of existing HAIC evaluation approaches and develops a fresh paradigm for more effectively evaluating these systems.
arXiv Detail & Related papers (2024-07-09T12:52:22Z)
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition [70.60872754129832]
First NeurIPS competition on unlearning sought to stimulate the development of novel algorithms. Nearly 1,200 teams from across the world participated. We analyze top solutions and delve into discussions on benchmarking unlearning.
arXiv Detail & Related papers (2024-06-13T12:58:00Z)
Quantifying AI Vulnerabilities: A Synthesis of Complexity, Dynamical Systems, and Game Theory [0.0]
We propose a novel approach that introduces three metrics: System Complexity Index (SCI), Lyapunov Exponent for AI Stability (LEAIS), and Nash Equilibrium Robustness (NER) SCI quantifies the inherent complexity of an AI system, LEAIS captures its stability and sensitivity to perturbations, and NER evaluates its strategic robustness against adversarial manipulation.
arXiv Detail & Related papers (2024-04-07T07:05:59Z)
Safe and Robust Reinforcement Learning: Principles and Practice [0.0]
Reinforcement Learning has shown remarkable success in solving relatively complex tasks. The deployment of RL systems in real-world scenarios poses significant challenges related to safety and robustness. This paper explores the main dimensions of the safe and robust RL landscape, encompassing algorithmic, ethical, and practical considerations.
arXiv Detail & Related papers (2024-03-27T13:14:29Z)
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications. This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z)
Sociotechnical Safety Evaluation of Generative AI Systems [13.546708226350963]
Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. We propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.
arXiv Detail & Related papers (2023-10-18T14:13:58Z)
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models. We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
L2Explorer: A Lifelong Reinforcement Learning Assessment Environment [49.40779372040652]
Reinforcement learning solutions tend to generalize poorly when exposed to new tasks outside of the data distribution they are trained on. We introduce a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer) L2Explorer is a new, Unity-based, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex evaluation curricula.
arXiv Detail & Related papers (2022-03-14T19:20:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.