Deployment Corrections: An incident response framework for frontier AI
models
- URL: http://arxiv.org/abs/2310.00328v1
- Date: Sat, 30 Sep 2023 10:07:39 GMT
- Title: Deployment Corrections: An incident response framework for frontier AI
models
- Authors: Joe O'Brien, Shaun Ee, Zoe Williams
- Abstract summary: This paper explores contingency plans for cases where pre-deployment risk management falls short.
We describe a toolkit of deployment corrections that AI developers can use to respond to dangerous capabilities.
We recommend frontier AI developers, standard-setting organizations, and regulators should collaborate to define a standardized industry-wide approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A comprehensive approach to addressing catastrophic risks from AI models
should cover the full model lifecycle. This paper explores contingency plans
for cases where pre-deployment risk management falls short: where either very
dangerous models are deployed, or deployed models become very dangerous.
Informed by incident response practices from industries including
cybersecurity, we describe a toolkit of deployment corrections that AI
developers can use to respond to dangerous capabilities, behaviors, or use
cases of AI models that develop or are detected after deployment. We also
provide a framework for AI developers to prepare and implement this toolkit.
We conclude by recommending that frontier AI developers should (1) maintain
control over model access, (2) establish or grow dedicated teams to design and
maintain processes for deployment corrections, including incident response
plans, and (3) establish these deployment corrections as allowable actions with
downstream users. We also recommend frontier AI developers, standard-setting
organizations, and regulators should collaborate to define a standardized
industry-wide approach to the use of deployment corrections in incident
response.
Caveat: This work applies to frontier AI models that are made available
through interfaces (e.g., API) that provide the AI developer or another
upstream party means of maintaining control over access (e.g., GPT-4 or
Claude). It does not apply to management of catastrophic risk from open-source
models (e.g., BLOOM or Llama-2), for which the restrictions we discuss are
largely unenforceable.
Related papers
- Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - Auction-Based Regulation for Artificial Intelligence [28.86995747151915]
We propose an auction-based regulatory mechanism to regulate AI safety.
We provably guarantee that each participating agent's best strategy is to submit a model safer than a prescribed minimum-safety threshold.
Empirical results show that our regulatory auction boosts safety and participation rates by 20% and 15% respectively.
arXiv Detail & Related papers (2024-10-02T17:57:02Z) - Adapting cybersecurity frameworks to manage frontier AI risks: A defense-in-depth approach [0.0]
We outline three approaches that can help identify gaps in the management of AI-related risks.
First, a functional approach identifies essential categories of activities that a risk management approach should cover.
Second, a lifecycle approach assigns safety and security activities across the model development lifecycle.
Third, a threat-based approach identifies tactics, techniques, and procedures used by malicious actors.
arXiv Detail & Related papers (2024-08-15T05:06:03Z) - MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems [21.693552236958983]
Cyber-Physical Systems (CPSs) are increasingly prevalent across various industrial and daily-life domains.
With recent advancements in artificial intelligence (AI), learning-based components, especially AI controllers, have become essential in enhancing the functionality and efficiency of CPSs.
The lack of interpretability in these AI controllers presents challenges to the safety and quality assurance of AI-enabled CPSs (AI-CPSs)
arXiv Detail & Related papers (2024-08-07T16:44:53Z) - Risks and Opportunities of Open-Source Generative AI [64.86989162783648]
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation.
This regulation is likely to put at risk the budding field of open-source generative AI.
arXiv Detail & Related papers (2024-05-14T13:37:36Z) - A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks [2.2192488799070444]
Legal autonomy can be achieved either by imposing constraints on AI actors such as developers, deployers and users, or by imposing constraints on the range and scope of the impact that AI agents can have on the environment.
The latter approach involves encoding extant rules concerning AI driven devices into the software of AI agents controlling those devices.
This is a challenge since the effectivity of such an approach requires a method of extracting, loading, transforming and computing legal information that would be both explainable and legally interoperable.
arXiv Detail & Related papers (2024-03-27T13:12:57Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Frontier AI Regulation: Managing Emerging Risks to Public Safety [15.85618115026625]
"Frontier AI" models could possess dangerous capabilities sufficient to pose severe risks to public safety.
Industry self-regulation is an important first step.
We propose an initial set of safety standards.
arXiv Detail & Related papers (2023-07-06T17:03:25Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Monitoring ROS2: from Requirements to Autonomous Robots [58.720142291102135]
This paper provides an overview of a formal approach to generating runtime monitors for autonomous robots from requirements written in a structured natural language.
Our approach integrates the Formal Requirement Elicitation Tool (FRET) with Copilot, a runtime verification framework, through the Ogma integration tool.
arXiv Detail & Related papers (2022-09-28T12:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.