Confidence-Building Measures for Artificial Intelligence: Workshop
Proceedings
- URL: http://arxiv.org/abs/2308.00862v2
- Date: Thu, 3 Aug 2023 20:06:39 GMT
- Title: Confidence-Building Measures for Artificial Intelligence: Workshop
Proceedings
- Authors: Sarah Shoker, Andrew Reddie, Sarah Barrington, Ruby Booth, Miles
Brundage, Husanjot Chahal, Michael Depp, Bill Drexel, Ritwik Gupta, Marina
Favaro, Jake Hecla, Alan Hickey, Margarita Konaev, Kirthi Kumar, Nathan
Lambert, Andrew Lohn, Cullen O'Keefe, Nazneen Rajani, Michael Sellitto,
Robert Trager, Leah Walker, Alexa Wehsener, Jessica Young
- Abstract summary: Foundation models could eventually introduce several pathways for undermining state security.
The Confidence-Building Measures for Artificial Intelligence workshop brought together a multistakeholder group to think through the tools and strategies to mitigate the risks.
The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape.
- Score: 3.090253451409658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models could eventually introduce several pathways for undermining
state security: accidents, inadvertent escalation, unintentional conflict, the
proliferation of weapons, and the interference with human diplomacy are just a
few on a long list. The Confidence-Building Measures for Artificial
Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley
Risk and Security Lab at the University of California brought together a
multistakeholder group to think through the tools and strategies to mitigate
the potential risks introduced by foundation models to international security.
Originating in the Cold War, confidence-building measures (CBMs) are actions
that reduce hostility, prevent conflict escalation, and improve trust between
parties. The flexibility of CBMs make them a key instrument for navigating the
rapid changes in the foundation model landscape. Participants identified the
following CBMs that directly apply to foundation models and which are further
explained in this conference proceedings: 1. crisis hotlines 2. incident
sharing 3. model, transparency, and system cards 4. content provenance and
watermarks 5. collaborative red teaming and table-top exercises and 6. dataset
and evaluation sharing. Because most foundation model developers are
non-government entities, many CBMs will need to involve a wider stakeholder
community. These measures can be implemented either by AI labs or by relevant
government actors.
Related papers
- Defining and Evaluating Physical Safety for Large Language Models [62.4971588282174]
Large Language Models (LLMs) are increasingly used to control robotic systems such as drones.
Their risks of causing physical threats and harm in real-world applications remain unexplored.
We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations.
arXiv Detail & Related papers (2024-11-04T17:41:25Z) - Mind the Gap: Foundation Models and the Covert Proliferation of Military Intelligence, Surveillance, and Targeting [0.0]
We show that the inability to prevent personally identifiable information from contributing to ISTAR capabilities may lead to the use and proliferation of military AI technologies by adversaries.
We conclude that in order to secure military systems and limit the proliferation of AI armaments, it may be necessary to insulate military AI systems and personal data from commercial foundation models.
arXiv Detail & Related papers (2024-10-18T19:04:30Z) - Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.
We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.
We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z) - Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications.
New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.
Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks.
This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z) - The GPT Dilemma: Foundation Models and the Shadow of Dual-Use [0.0]
This paper examines the dual-use challenges of foundation models and the risks they pose for international security.
The paper analyzes four critical factors in the development cycle of foundation models: model inputs, capabilities, system use cases, and system deployment.
Using the Intermediate-Range Nuclear Forces (INF) Treaty as a case study, this paper proposes several strategies to mitigate the associated risks.
arXiv Detail & Related papers (2024-07-29T22:36:27Z) - Symbiotic Game and Foundation Models for Cyber Deception Operations in Strategic Cyber Warfare [16.378537388284027]
We are currently facing unprecedented cyber warfare with the rapid evolution of tactics, increasing asymmetry of intelligence, and the growing accessibility of hacking tools.
This chapter aims to highlight the pivotal role of game-theoretic models and foundation models (FMs) in analyzing, designing, and implementing cyber deception tactics.
arXiv Detail & Related papers (2024-03-14T20:17:57Z) - A Safe Harbor for AI Evaluation and Red Teaming [124.89885800509505]
Some researchers fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal.
We propose that major AI developers commit to providing a legal and technical safe harbor.
We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
arXiv Detail & Related papers (2024-03-07T20:55:08Z) - Escalation Risks from Language Models in Military and Diplomatic
Decision-Making [0.0]
This work aims to scrutinize the behavior of multiple AI agents in simulated wargames.
We design a novel wargame simulation and scoring framework to assess the risks of the escalation of actions taken by these agents.
We observe that models tend to develop arms-race dynamics, leading to greater conflict, and in rare cases, even to the deployment of nuclear weapons.
arXiv Detail & Related papers (2024-01-07T07:59:10Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Jailbroken: How Does LLM Safety Training Fail? [92.8748773632051]
"jailbreak" attacks on early releases of ChatGPT elicit undesired behavior.
We investigate why such attacks succeed and how they can be created.
New attacks utilizing our failure modes succeed on every prompt in a collection of unsafe requests.
arXiv Detail & Related papers (2023-07-05T17:58:10Z) - Cerberus: Exploring Federated Prediction of Security Events [21.261584854569893]
We explore the feasibility of using Federated Learning (FL) to predict future security events.
We introduce Cerberus, a system enabling collaborative training of Recurrent Neural Network (RNN) models for participating organizations.
arXiv Detail & Related papers (2022-09-07T10:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.