Related papers: Secret Collusion Among Generative AI Agents

Secret Collusion Among Generative AI Agents

URL: http://arxiv.org/abs/2402.07510v1
Date: Mon, 12 Feb 2024 09:31:21 GMT
Title: Secret Collusion Among Generative AI Agents
Authors: Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H.S. Torr, Lewis Hammond, Christian Schroeder de Witt
Abstract summary: Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information. Modern steganographic techniques could render such dynamics hard to detect.
Score: 45.64856386399717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent capability increases in large language models (LLMs) open up applications in which teams of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both the AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.

Related papers

Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems [11.497269773189254]
We present a system-level anomaly detection framework tailored for large language model (LLM)-based multi-agent systems (MAS)<n>We propose a graph-based framework that models agent interactions as dynamic execution graphs, enabling semantic anomaly detection at node, edge, and path levels.<n>Second, we introduce a pluggable SentinelAgent, an LLM-powered oversight agent that observes, analyzes, and intervenes in MAS execution based on security policies and contextual reasoning.
arXiv Detail & Related papers (2025-05-30T04:25:19Z)
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling [2.7211182721830123]
Large language models (LLMs) enable the development of intelligent agents capable of engaging in complex and multi-turn dialogues.<n>GUARDIAN is a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs.
arXiv Detail & Related papers (2025-05-25T17:15:55Z)
Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks [5.120446836495469]
We introduce the Hidden Profile paradigm from social psychology as a diagnostic testbed for multi-agent LLM systems.<n>By distributing critical information asymmetrically across agents, the paradigm reveals how inter-agent dynamics support or hinder collective reasoning.<n>We find that while cooperative agents are prone to over-coordination in collective settings, increased contradiction impairs group convergence.
arXiv Detail & Related papers (2025-05-15T19:22:54Z)
Methods and Trends in Detecting Generated Images: A Comprehensive Review [0.552480439325792]
Generative Adversarial Networks (GANs), Diffusion Models, and Variational Autoencoders (VAEs) have enabled the synthesis of high-quality multimedia data. These advancements have also raised significant concerns regarding adversarial attacks, unethical usage, and societal harm.
arXiv Detail & Related papers (2025-02-21T03:16:18Z)
Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning [0.41783829807634765]
Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games. This paper advocates for direct interpretability, generating post hoc explanations directly from trained models. We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery.
arXiv Detail & Related papers (2025-02-02T09:15:27Z)
Safety at Scale: A Comprehensive Survey of Large Model Safety [298.05093528230753]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs [0.600808022072121]
Collusion to the disadvantage of others has been identified as a central form of undesirable agent cooperation. The use of information hiding (steganography) in agent communications could render collusion practically undetectable. This underscores the need for evaluation frameworks to monitor and mitigate steganographic collusion capabilities.
arXiv Detail & Related papers (2024-10-02T16:18:33Z)
AI Safety in Generative AI Large Language Models: A Survey [14.737084887928408]
Large Language Model (LLMs) that exhibit generative AI capabilities are facing accelerated adoption and innovation. Generative AI (GAI) inevitably raises concerns about the risks and safety associated with these models. This article provides an up-to-date survey of recent trends in AI safety research of GAI-LLMs from a computer scientist's perspective.
arXiv Detail & Related papers (2024-07-06T09:00:18Z)
On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) [0.09208007322096533]
We present HuntGPT, a specialized intrusion detection dashboard applying a Random Forest classifier. The paper delves into the system's architecture, components, and technical accuracy, assessed through Certified Information Security Manager (CISM) Practice Exams. The results demonstrate that conversational agents, supported by LLM and integrated with XAI, provide robust, explainable, and actionable AI solutions in intrusion detection.
arXiv Detail & Related papers (2023-09-27T20:58:13Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Discovering Individual Rewards in Collective Behavior through Inverse Multi-Agent Reinforcement Learning [3.4437947384641032]
We introduce an off-policy inverse multi-agent reinforcement learning algorithm (IMARL) By leveraging demonstrations, our algorithm automatically uncovers the reward function and learns an effective policy for the agents. The proposed IMARL algorithm is a significant step towards understanding collective dynamics from the perspective of its constituents.
arXiv Detail & Related papers (2023-05-17T20:07:30Z)
Decentralized Adversarial Training over Graphs [55.28669771020857]
The vulnerability of machine learning models to adversarial attacks has been attracting considerable attention in recent years. This work studies adversarial training over graphs, where individual agents are subjected to varied strength perturbation space.
arXiv Detail & Related papers (2023-03-23T15:05:16Z)
A Minimax Approach Against Multi-Armed Adversarial Attacks Detection [31.971443221041174]
Multi-armed adversarial attacks have been shown to be highly successful in fooling state-of-the-art detectors. We propose a solution that aggregates the soft-probability outputs of multiple pre-trained detectors according to a minimax approach. We show that our aggregation consistently outperforms individual state-of-the-art detectors against multi-armed adversarial attacks.
arXiv Detail & Related papers (2023-02-04T18:21:22Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.