Related papers: The Traitors: Deception and Trust in Multi-Agent Language Model Simulations

The Traitors: Deception and Trust in Multi-Agent Language Model Simulations

URL: http://arxiv.org/abs/2505.12923v1
Date: Mon, 19 May 2025 10:01:35 GMT
Title: The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
Authors: Pedro M. P. Curvo,
Abstract summary: We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games.<n>We develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality.<n>Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games, designed to probe deception, trust formation, and strategic communication among large language model (LLM) agents under asymmetric information. A minority of agents the traitors seek to mislead the majority, while the faithful must infer hidden identities through dialogue and reasoning. Our contributions are: (1) we ground the environment in formal frameworks from game theory, behavioral economics, and social cognition; (2) we develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality; (3) we implement a fully autonomous simulation platform where LLMs reason over persistent memory and evolving social dynamics, with support for heterogeneous agent populations, specialized traits, and adaptive behaviors. Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry: advanced models like GPT-4o demonstrate superior deceptive capabilities yet exhibit disproportionate vulnerability to others' falsehoods. This suggests deception skills may scale faster than detection abilities. Overall, The Traitors provides a focused, configurable testbed for investigating LLM behavior in socially nuanced interactions. We position this work as a contribution toward more rigorous research on deception mechanisms, alignment challenges, and the broader social reliability of AI systems.

Related papers

An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems [40.53603737069306]
A multi-agent AI system (MAS) is composed of multiple autonomous agents that interact, exchange information, and make decisions based on internal generative models.<n>This paper outlines a formal framework for analyzing MAS, focusing on two core aspects: effectiveness and safety.
arXiv Detail & Related papers (2025-05-23T22:05:19Z)
Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks [5.120446836495469]
We introduce the Hidden Profile paradigm from social psychology as a diagnostic testbed for multi-agent LLM systems.<n>By distributing critical information asymmetrically across agents, the paradigm reveals how inter-agent dynamics support or hinder collective reasoning.<n>We find that while cooperative agents are prone to over-coordination in collective settings, increased contradiction impairs group convergence.
arXiv Detail & Related papers (2025-05-15T19:22:54Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions [51.96890647837277]
Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users.<n>This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence.
arXiv Detail & Related papers (2025-04-07T21:01:25Z)
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems [132.77459963706437]
This book provides a comprehensive overview, framing intelligent agents within modular, brain-inspired architectures.<n>It explores self-enhancement and adaptive evolution mechanisms, exploring how agents autonomously refine their capabilities.<n>It also examines the collective intelligence emerging from agent interactions, cooperation, and societal structures.
arXiv Detail & Related papers (2025-03-31T18:00:29Z)
Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection [31.38516078163367]
ToM-agent is designed to empower LLMs-based generative agents to simulate ToM in open-domain conversational interactions.<n>ToM-agent disentangles the confidence from mental states, facilitating the emulation of an agent's perception of its counterpart's mental states.<n>Our findings indicate that the ToM-agent can grasp the underlying reasons for their counterpart's behaviors beyond mere semantic-emotional supporting or decision-making based on common sense.
arXiv Detail & Related papers (2025-01-26T00:32:38Z)
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios [38.878966229688054]
We introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
arXiv Detail & Related papers (2024-10-25T07:04:16Z)
SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning [58.84311336011451]
We propose a novel gradient-based state representation for multi-agent reinforcement learning. We employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO.
arXiv Detail & Related papers (2024-05-03T04:12:19Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation [46.42384207122049]
We design SimulateBench to evaluate the believability of large language models (LLMs) when simulating human behaviors. Based on SimulateBench, we evaluate the performances of 10 widely used LLMs when simulating characters.
arXiv Detail & Related papers (2023-12-28T16:51:11Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.