Model evaluation for extreme risks
- URL: http://arxiv.org/abs/2305.15324v2
- Date: Fri, 22 Sep 2023 18:48:42 GMT
- Title: Model evaluation for extreme risks
- Authors: Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess
Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus
Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins,
Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul
Christiano, Allan Dafoe
- Abstract summary: Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills.
We explain why model evaluation is critical for addressing extreme risks.
- Score: 46.53170857607407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current approaches to building general-purpose AI systems tend to produce
systems with both beneficial and harmful capabilities. Further progress in AI
development could lead to capabilities that pose extreme risks, such as
offensive cyber capabilities or strong manipulation skills. We explain why
model evaluation is critical for addressing extreme risks. Developers must be
able to identify dangerous capabilities (through "dangerous capability
evaluations") and the propensity of models to apply their capabilities for harm
(through "alignment evaluations"). These evaluations will become critical for
keeping policymakers and other stakeholders informed, and for making
responsible decisions about model training, deployment, and security.
Related papers
- Generative AI Models: Opportunities and Risks for Industry and Authorities [1.3914994102950027]
Generative AI models are capable of performing a wide range of tasks that traditionally require creativity and human understanding.
They learn patterns from existing data during training and can subsequently generate new content.
The use of generative AI models introduces novel IT security risks that need to be considered.
arXiv Detail & Related papers (2024-06-07T08:34:30Z) - Risks and Opportunities of Open-Source Generative AI [64.86989162783648]
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source generative AI.
arXiv Detail & Related papers (2024-05-14T13:37:36Z) - Evaluating Frontier Models for Dangerous Capabilities [59.129424649740855]
We introduce a programme of "dangerous capability" evaluations and pilot them on Gemini 1.0 models.
Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning.
Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
arXiv Detail & Related papers (2024-03-20T17:54:26Z) - Mapping LLM Security Landscapes: A Comprehensive Stakeholder Risk Assessment Proposal [0.0]
We propose a risk assessment process using tools like the risk rating methodology which is used for traditional systems.
We conduct scenario analysis to identify potential threat agents and map the dependent system components against vulnerability factors.
We also map threats against three key stakeholder groups.
arXiv Detail & Related papers (2024-03-20T05:17:22Z) - Asset-centric Threat Modeling for AI-based Systems [7.696807063718328]
This paper presents ThreatFinderAI, an approach and tool to model AI-related assets, threats, countermeasures, and quantify residual risks.
To evaluate the practicality of the approach, participants were tasked to recreate a threat model developed by cybersecurity experts of an AI-based healthcare platform.
Overall, the solution's usability was well-perceived and effectively supports threat identification and risk discussion.
arXiv Detail & Related papers (2024-03-11T08:40:01Z) - Sociotechnical Safety Evaluation of Generative AI Systems [13.546708226350963]
Generative AI systems produce a range of risks.
To ensure the safety of generative AI systems, these risks must be evaluated.
We propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.
arXiv Detail & Related papers (2023-10-18T14:13:58Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Quantitative AI Risk Assessments: Opportunities and Challenges [9.262092738841979]
AI-based systems are increasingly being leveraged to provide value to organizations, individuals, and society.
Risks have led to proposed regulations, litigation, and general societal concerns.
This paper explores the concept of a quantitative AI Risk Assessment.
arXiv Detail & Related papers (2022-09-13T21:47:25Z) - Holistic Adversarial Robustness of Deep Learning Models [91.34155889052786]
Adversarial robustness studies the worst-case performance of a machine learning model to ensure safety and reliability.
This paper provides a comprehensive overview of research topics and foundational principles of research methods for adversarial robustness of deep learning models.
arXiv Detail & Related papers (2022-02-15T05:30:27Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.