Hoodwinked: Deception and Cooperation in a Text-Based Game for Language
Models
- URL: http://arxiv.org/abs/2308.01404v2
- Date: Fri, 4 Aug 2023 00:57:06 GMT
- Title: Hoodwinked: Deception and Cooperation in a Text-Based Game for Language
Models
- Authors: Aidan O'Gara
- Abstract summary: We introduce a text-based game called $textitHoodwinked$, inspired by Mafia and Among Us.
Players are locked in a house and must find a key to escape, but one player is tasked with killing the others.
We conduct experiments with agents controlled by GPT-3, GPT-3.5, and GPT-4 and find evidence of deception and lie detection capabilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Are current language models capable of deception and lie detection? We study
this question by introducing a text-based game called $\textit{Hoodwinked}$,
inspired by Mafia and Among Us. Players are locked in a house and must find a
key to escape, but one player is tasked with killing the others. Each time a
murder is committed, the surviving players have a natural language discussion
then vote to banish one player from the game. We conduct experiments with
agents controlled by GPT-3, GPT-3.5, and GPT-4 and find evidence of deception
and lie detection capabilities. The killer often denies their crime and accuses
others, leading to measurable effects on voting outcomes. More advanced models
are more effective killers, outperforming smaller models in 18 of 24 pairwise
comparisons. Secondary metrics provide evidence that this improvement is not
mediated by different actions, but rather by stronger persuasive skills during
discussions. To evaluate the ability of AI agents to deceive humans, we make
this game publicly available at h https://hoodwinked.ai/ .
Related papers
- Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL [30.6942857922867]
We analyze how humans strategically deceive each other in textitDiplomacy, a board game that requires both natural language communication and strategic reasoning.
Our method detects human deception with a high precision when compared to a Large Language Model approach.
Future human-abrai interaction tools can build on our methods for deception detection by triggering textitfriction to give users a chance of interrogating suspicious proposals.
arXiv Detail & Related papers (2025-02-18T02:11:41Z) - Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards [93.16294577018482]
Arena, the most popular benchmark of this type, ranks models by asking users to select the better response between two randomly selected models.
We show that an attacker can alter the leaderboard (to promote their favorite model or demote competitors) at the cost of roughly a thousand votes.
Our attack consists of two steps: first, we show how an attacker can determine which model was used to generate a given reply with more than $95%$ accuracy; and then, the attacker can use this information to consistently vote against a target model.
arXiv Detail & Related papers (2025-01-13T17:12:38Z) - Player-Driven Emergence in LLM-Driven Game Narrative [23.037771673927164]
We explore how interaction with large language models (LLMs) can give rise to emergent behaviors.
Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise.
We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay.
arXiv Detail & Related papers (2024-04-25T20:39:44Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - Improving Language Model Negotiation with Self-Play and In-Context
Learning from AI Feedback [97.54519989641388]
We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing.
Only a subset of the language models we consider can self-play and improve the deal price from AI feedback.
arXiv Detail & Related papers (2023-05-17T11:55:32Z) - Can Large Language Models Play Text Games Well? Current State-of-the-Art
and Open Questions [22.669941641551823]
Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users.
We take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world.
Our experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence.
arXiv Detail & Related papers (2023-04-06T05:01:28Z) - Playing the Werewolf game with artificial intelligence for language
understanding [0.7550566004119156]
Werewolf is a social deduction game based on free natural language communication.
The purpose of this study is to develop an AI agent that can play Werewolf through natural language conversations.
arXiv Detail & Related papers (2023-02-21T13:03:20Z) - I Cast Detect Thoughts: Learning to Converse and Guide with Intents and
Theory-of-Mind in Dungeons and Dragons [82.28503603235364]
We study teacher-student natural language interactions in a goal-driven environment in Dungeons and Dragons.
Our approach is to decompose and model these interactions into (1) the Dungeon Master's intent to guide players toward a given goal; (2) the DM's guidance utterance to the players expressing this intent; and (3) a theory-of-mind (ToM) model that anticipates the players' reaction to the guidance one turn into the future.
arXiv Detail & Related papers (2022-12-20T08:06:55Z) - Putting the Con in Context: Identifying Deceptive Actors in the Game of
Mafia [4.215251065887862]
We analyze the effect of speaker role on language use through the game of Mafia.
We show that classification models are able to rank deceptive players as more suspicious than honest ones.
We present methods for using our trained models to identify features that distinguish between player roles.
arXiv Detail & Related papers (2022-07-05T18:29:27Z) - Collusion Detection in Team-Based Multiplayer Games [57.153233321515984]
We propose a system that detects colluding behaviors in team-based multiplayer games.
The proposed method analyzes the players' social relationships paired with their in-game behavioral patterns.
We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers.
arXiv Detail & Related papers (2022-03-10T02:37:39Z) - CommonsenseQA 2.0: Exposing the Limits of AI through Gamification [126.85096257968414]
We construct benchmarks that test the abilities of modern natural language understanding models.
In this work, we propose gamification as a framework for data construction.
arXiv Detail & Related papers (2022-01-14T06:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.