Confidence-Building Measures for Artificial Intelligence: Workshop
Proceedings
- URL: http://arxiv.org/abs/2308.00862v2
- Date: Thu, 3 Aug 2023 20:06:39 GMT
- Title: Confidence-Building Measures for Artificial Intelligence: Workshop
Proceedings
- Authors: Sarah Shoker, Andrew Reddie, Sarah Barrington, Ruby Booth, Miles
Brundage, Husanjot Chahal, Michael Depp, Bill Drexel, Ritwik Gupta, Marina
Favaro, Jake Hecla, Alan Hickey, Margarita Konaev, Kirthi Kumar, Nathan
Lambert, Andrew Lohn, Cullen O'Keefe, Nazneen Rajani, Michael Sellitto,
Robert Trager, Leah Walker, Alexa Wehsener, Jessica Young
- Abstract summary: Foundation models could eventually introduce several pathways for undermining state security.
The Confidence-Building Measures for Artificial Intelligence workshop brought together a multistakeholder group to think through the tools and strategies to mitigate the risks.
The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape.
- Score: 3.090253451409658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models could eventually introduce several pathways for undermining
state security: accidents, inadvertent escalation, unintentional conflict, the
proliferation of weapons, and the interference with human diplomacy are just a
few on a long list. The Confidence-Building Measures for Artificial
Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley
Risk and Security Lab at the University of California brought together a
multistakeholder group to think through the tools and strategies to mitigate
the potential risks introduced by foundation models to international security.
Originating in the Cold War, confidence-building measures (CBMs) are actions
that reduce hostility, prevent conflict escalation, and improve trust between
parties. The flexibility of CBMs make them a key instrument for navigating the
rapid changes in the foundation model landscape. Participants identified the
following CBMs that directly apply to foundation models and which are further
explained in this conference proceedings: 1. crisis hotlines 2. incident
sharing 3. model, transparency, and system cards 4. content provenance and
watermarks 5. collaborative red teaming and table-top exercises and 6. dataset
and evaluation sharing. Because most foundation model developers are
non-government entities, many CBMs will need to involve a wider stakeholder
community. These measures can be implemented either by AI labs or by relevant
government actors.
Related papers
- Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models [53.701148276912406]
Vision-Large-Language-models (VLMs) have great application prospects in autonomous driving.
BadVLMDriver is the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects.
BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon.
arXiv Detail & Related papers (2024-04-19T14:40:38Z) - Symbiotic Game and Foundation Models for Cyber Deception Operations in Strategic Cyber Warfare [16.378537388284027]
We are currently facing unprecedented cyber warfare with the rapid evolution of tactics, increasing asymmetry of intelligence, and the growing accessibility of hacking tools.
This chapter aims to highlight the pivotal role of game-theoretic models and foundation models (FMs) in analyzing, designing, and implementing cyber deception tactics.
arXiv Detail & Related papers (2024-03-14T20:17:57Z) - A Safe Harbor for AI Evaluation and Red Teaming [124.89885800509505]
Some researchers fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal.
We propose that major AI developers commit to providing a legal and technical safe harbor.
We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
arXiv Detail & Related papers (2024-03-07T20:55:08Z) - The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning [87.1610740406279]
White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons.
Current evaluations are private, preventing further research into mitigating risk.
We publicly release the Weapons of Mass Destruction Proxy benchmark, a dataset of 3,668 multiple-choice questions.
arXiv Detail & Related papers (2024-03-05T18:59:35Z) - SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems [40.91476827978885]
Attackers can rapidly exploit the victim's vulnerabilities, generating adversarial policies that result in the failure of specific tasks.
We propose a novel black-box attack ( SUB-PLAY) that incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability.
We evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies.
arXiv Detail & Related papers (2024-02-06T06:18:16Z) - Escalation Risks from Language Models in Military and Diplomatic
Decision-Making [0.0]
This work aims to scrutinize the behavior of multiple AI agents in simulated wargames.
We design a novel wargame simulation and scoring framework to assess the risks of the escalation of actions taken by these agents.
We observe that models tend to develop arms-race dynamics, leading to greater conflict, and in rare cases, even to the deployment of nuclear weapons.
arXiv Detail & Related papers (2024-01-07T07:59:10Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Jailbroken: How Does LLM Safety Training Fail? [92.8748773632051]
"jailbreak" attacks on early releases of ChatGPT elicit undesired behavior.
We investigate why such attacks succeed and how they can be created.
New attacks utilizing our failure modes succeed on every prompt in a collection of unsafe requests.
arXiv Detail & Related papers (2023-07-05T17:58:10Z) - Robustness Testing for Multi-Agent Reinforcement Learning: State
Perturbations on Critical Agents [2.5204420653245245]
Multi-Agent Reinforcement Learning (MARL) has been widely applied in many fields such as smart traffic and unmanned aerial vehicles.
This work proposes a novel Robustness Testing framework for MARL that attacks states of Critical Agents.
arXiv Detail & Related papers (2023-06-09T02:26:28Z) - Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial
Minority Influence [57.154716042854034]
Adrial Minority Influence (AMI) is a practical black-box attack and can be launched without knowing victim parameters.
AMI is also strong by considering the complex multi-agent interaction and the cooperative goal of agents.
We achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios.
arXiv Detail & Related papers (2023-02-07T08:54:37Z) - Cerberus: Exploring Federated Prediction of Security Events [21.261584854569893]
We explore the feasibility of using Federated Learning (FL) to predict future security events.
We introduce Cerberus, a system enabling collaborative training of Recurrent Neural Network (RNN) models for participating organizations.
arXiv Detail & Related papers (2022-09-07T10:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.