Related papers: Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings

Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings

URL: http://arxiv.org/abs/2308.00862v2
Date: Thu, 3 Aug 2023 20:06:39 GMT
Title: Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings
Authors: Sarah Shoker, Andrew Reddie, Sarah Barrington, Ruby Booth, Miles Brundage, Husanjot Chahal, Michael Depp, Bill Drexel, Ritwik Gupta, Marina Favaro, Jake Hecla, Alan Hickey, Margarita Konaev, Kirthi Kumar, Nathan Lambert, Andrew Lohn, Cullen O'Keefe, Nazneen Rajani, Michael Sellitto, Robert Trager, Leah Walker, Alexa Wehsener, Jessica Young
Abstract summary: Foundation models could eventually introduce several pathways for undermining state security. The Confidence-Building Measures for Artificial Intelligence workshop brought together a multistakeholder group to think through the tools and strategies to mitigate the risks. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape.
Score: 3.090253451409658
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security. Originating in the Cold War, confidence-building measures (CBMs) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1. crisis hotlines 2. incident sharing 3. model, transparency, and system cards 4. content provenance and watermarks 5. collaborative red teaming and table-top exercises and 6. dataset and evaluation sharing. Because most foundation model developers are non-government entities, many CBMs will need to involve a wider stakeholder community. These measures can be implemented either by AI labs or by relevant government actors.

Related papers

An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We focus on technical approaches to misuse and misalignment. We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z)
Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation [3.0175628677371935]
Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. We propose a human-AI collaborative framework that introduces both technical and policy countermeasures.
arXiv Detail & Related papers (2025-01-30T15:14:55Z)
Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z)
Defining and Evaluating Physical Safety for Large Language Models [62.4971588282174]
Large Language Models (LLMs) are increasingly used to control robotic systems such as drones. Their risks of causing physical threats and harm in real-world applications remain unexplored. We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations.
arXiv Detail & Related papers (2024-11-04T17:41:25Z)
Mind the Gap: Foundation Models and the Covert Proliferation of Military Intelligence, Surveillance, and Targeting [0.0]
We show that the inability to prevent personally identifiable information from contributing to ISTAR capabilities may lead to the use and proliferation of military AI technologies by adversaries. We conclude that in order to secure military systems and limit the proliferation of AI armaments, it may be necessary to insulate military AI systems and personal data from commercial foundation models.
arXiv Detail & Related papers (2024-10-18T19:04:30Z)
Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users. We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions. We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z)
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications. New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z)
Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends [64.57762280003618]
It is foreseeable that in the near future, LM-driven general AI agents will serve as essential tools in production tasks. This paper investigates scenarios involving the autonomous collaboration of future LM agents.
arXiv Detail & Related papers (2024-09-22T14:09:49Z)
The GPT Dilemma: Foundation Models and the Shadow of Dual-Use [0.0]
This paper examines the dual-use challenges of foundation models and the risks they pose for international security. The paper analyzes four critical factors in the development cycle of foundation models: model inputs, capabilities, system use cases, and system deployment. Using the Intermediate-Range Nuclear Forces (INF) Treaty as a case study, this paper proposes several strategies to mitigate the associated risks.
arXiv Detail & Related papers (2024-07-29T22:36:27Z)
Symbiotic Game and Foundation Models for Cyber Deception Operations in Strategic Cyber Warfare [16.378537388284027]
We are currently facing unprecedented cyber warfare with the rapid evolution of tactics, increasing asymmetry of intelligence, and the growing accessibility of hacking tools. This chapter aims to highlight the pivotal role of game-theoretic models and foundation models (FMs) in analyzing, designing, and implementing cyber deception tactics.
arXiv Detail & Related papers (2024-03-14T20:17:57Z)
A Safe Harbor for AI Evaluation and Red Teaming [124.89885800509505]
Some researchers fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. We propose that major AI developers commit to providing a legal and technical safe harbor. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
arXiv Detail & Related papers (2024-03-07T20:55:08Z)
Escalation Risks from Language Models in Military and Diplomatic Decision-Making [0.0]
This work aims to scrutinize the behavior of multiple AI agents in simulated wargames. We design a novel wargame simulation and scoring framework to assess the risks of the escalation of actions taken by these agents. We observe that models tend to develop arms-race dynamics, leading to greater conflict, and in rare cases, even to the deployment of nuclear weapons.
arXiv Detail & Related papers (2024-01-07T07:59:10Z)
Jailbroken: How Does LLM Safety Training Fail? [92.8748773632051]
"jailbreak" attacks on early releases of ChatGPT elicit undesired behavior. We investigate why such attacks succeed and how they can be created. New attacks utilizing our failure modes succeed on every prompt in a collection of unsafe requests.
arXiv Detail & Related papers (2023-07-05T17:58:10Z)
Cerberus: Exploring Federated Prediction of Security Events [21.261584854569893]
We explore the feasibility of using Federated Learning (FL) to predict future security events. We introduce Cerberus, a system enabling collaborative training of Recurrent Neural Network (RNN) models for participating organizations.
arXiv Detail & Related papers (2022-09-07T10:31:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.