Related papers: ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations

URL: http://arxiv.org/abs/2511.05359v1
Date: Fri, 07 Nov 2025 15:49:49 GMT
Title: ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Authors: Amr Gomaa, Ahmed Salem, Sahar Abdelnabi,
Abstract summary: ConVerse is a benchmark for evaluating privacy and security risks in agent-agent interactions.<n>It spans three practical domains with 12 user personas and over 864 contextually grounded attacks.<n>By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.
Score: 11.177126931962443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent-agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing abstraction quality, while security attacks target tool use and preference manipulation. Evaluating seven state-of-the-art models reveals persistent vulnerabilities; privacy attacks succeed in up to 88% of cases and security breaches in up to 60%, with stronger models leaking more. By unifying privacy and security within interactive multi-agent contexts, ConVerse reframes safety as an emergent property of communication.

Related papers

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage [59.3826294523924]
We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup.<n>We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable.
arXiv Detail & Related papers (2026-02-13T21:32:32Z)
Aegis: Towards Governance, Integrity, and Security of AI Voice Agents [52.7512082818639]
We propose Aegis, a framework for the governance, integrity, and security of voice agents.<n>We evaluate the framework through case studies in banking call centers, IT Support, and logistics.<n>We observe systematic differences across model families, with open-weight models exhibiting higher susceptibility.
arXiv Detail & Related papers (2026-02-07T05:51:36Z)
Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents [3.7549350220109274]
We study user-mediated attacks, where benign users are tricked into relaying untrusted or attacker-controlled content to agents.<n>We conduct a systematic evaluation of 12 commercial agents in a sandboxed environment.<n>Our results show that agents are too helpful to be safe by default.
arXiv Detail & Related papers (2026-01-14T03:29:13Z)
AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration [23.22266919684932]
AgentCrypt is a framework for fine-grained, encrypted agent communication.<n>It ensures privacy across diverse interactions and enables computation on otherwise inaccessible data.<n>We implemented and tested it with Langgraph and Google ADK, demonstrating versatility across platforms.
arXiv Detail & Related papers (2025-12-08T23:20:20Z)
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning [78.5751183537704]
AdvEvo-MARL is a co-evolutionary multi-agent reinforcement learning framework that internalizes safety into task agents.<n>Rather than relying on external guards, AdvEvo-MARL jointly optimize attackers and defenders.
arXiv Detail & Related papers (2025-10-02T02:06:30Z)
Searching for Privacy Risks in LLM Agents via Simulation [61.229785851581504]
We present a search-based framework that alternates between improving attack and defense strategies through the simulation of privacy-critical agent interactions.<n>We find that attack strategies escalate from direct requests to sophisticated tactics, such as impersonation and consent forgery.<n>The discovered attacks and defenses transfer across diverse scenarios and backbone models, demonstrating strong practical utility for building privacy-aware agents.
arXiv Detail & Related papers (2025-08-14T17:49:09Z)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures [59.43633341497526]
Large-Language-Model-driven AI agents have exhibited unprecedented intelligence and adaptability.<n>Agent communication is regarded as a foundational pillar of the future AI ecosystem.<n>This paper presents a comprehensive survey of agent communication security.
arXiv Detail & Related papers (2025-06-24T14:44:28Z)
Kaleidoscopic Teaming in Multi Agent Simulations [75.47388708240042]
We argue that existing red teaming or safety evaluation frameworks fall short in evaluating safety risks in complex behaviors, thought processes and actions taken by agents.<n>We introduce new in-context optimization techniques that can be used in our kaleidoscopic teaming framework to generate better scenarios for safety analysis.<n>We present appropriate metrics that can be used along with our framework to measure safety of agents.
arXiv Detail & Related papers (2025-06-20T23:37:17Z)
AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management [28.14286256061824]
Large Language Model based multi-agent systems are revolutionizing autonomous communication and collaboration.<n>We introduce AgentSafe, a novel framework that enhances MAS security through hierarchical information management and memory protection.<n>AgentSafe incorporates two components: ThreatSieve, which secures communication by verifying information authority and preventing impersonation, and HierarCache, an adaptive memory management system that defends against unauthorized access and malicious poisoning.
arXiv Detail & Related papers (2025-03-06T12:41:54Z)
Air Gap: Protecting Privacy-Conscious Conversational Agents [44.04662124191715]
We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. We introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task.
arXiv Detail & Related papers (2024-05-08T16:12:45Z)
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems [51.6210785955659]
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $CfracN-12$ agents to a victim agent.
arXiv Detail & Related papers (2022-06-21T07:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.