Related papers: Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models

Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models

URL: http://arxiv.org/abs/2311.05720v1
Date: Thu, 9 Nov 2023 20:04:08 GMT
Title: Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models
Authors: Simon Stepputtis, Joseph Campbell, Yaqi Xie, Zhengyang Qi, Wenxin Sharon Zhang, Ruiyi Wang, Sanketh Rangreji, Michael Lewis, Katia Sycara
Abstract summary: We explore the game of Avalon: The Resistance, a social deduction game in which players must determine each other's hidden identities to complete their team's objective. We introduce an online testbed and a dataset containing 20 carefully collected and labeled games that exhibit long-horizon deception in a cooperative-competitive setting. We discuss the capabilities of LLMs to utilize deceptive long-horizon conversations between six human players to determine each player's goal and motivation.
Score: 6.176709034158014
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game of Avalon: The Resistance, a social deduction game in which players must determine each other's hidden identities to complete their team's objective. We introduce an online testbed and a dataset containing 20 carefully collected and labeled games among human players that exhibit long-horizon deception in a cooperative-competitive setting. We discuss the capabilities of LLMs to utilize deceptive long-horizon conversations between six human players to determine each player's goal and motivation. Particularly, we discuss the multimodal integration of the chat between the players and the game's state that grounds the conversation, providing further insights into the true player identities. We find that even current state-of-the-art LLMs do not reach human performance, making our dataset a compelling benchmark to investigate the decision-making and language-processing capabilities of LLMs. Our dataset and online testbed can be found at our project website: https://sstepput.github.io/Avalon-NLU/

Related papers

Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions [55.2480439325792]
Role-playing games (RPG) are games in which players interact with one another to create narratives. This emerging form of shared narrative, primarily oral, is receiving increasing attention. In this paper, we aim to discover to what extent the language of Large Language Models (LLMs) exhibit oral or written features when asked to generate an RPG session.
arXiv Detail & Related papers (2025-03-26T15:10:47Z)
Digital Player: Evaluating Large Language Models based Human-like Agent in Games [35.28605831046303]
We develop an application-level testbed based on the open-source strategy game "Unciv" This "Civilization"-like game features expansive decision-making spaces along with rich linguistic interactions. We generate human-like responses for social interaction, collaboration, and negotiation with human players.
arXiv Detail & Related papers (2025-02-28T07:46:55Z)
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews [65.35458530702442]
We focus on journalistic interviews, a domain rich in grounding communication and abundant in data. We curate a dataset of 40,000 two-person informational interviews from NPR and CNN. LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions.
arXiv Detail & Related papers (2024-11-21T01:37:38Z)
AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game [12.384945632524424]
This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior. Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context.
arXiv Detail & Related papers (2024-07-23T14:34:38Z)
Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft [14.877848057734463]
We design a minigame within Minecraft where a player works with two GPT4-driven NPCs to complete a quest. On analyzing the game logs and recordings, we find that several patterns of collaborative behavior emerge from the NPCs and the human players. We believe that this preliminary study and analysis will inform future game developers on how to better exploit these rapidly improving generative AI models for collaborative roles in games.
arXiv Detail & Related papers (2024-07-03T19:11:21Z)
A Dialogue Game for Eliciting Balanced Collaboration [64.61707514432533]
We present a two-player 2D object placement game in which the players must negotiate the goal state themselves. We show empirically that human players exhibit a variety of role distributions, and that balanced collaboration improves task performance.
arXiv Detail & Related papers (2024-06-12T13:35:10Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting. We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance. We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z)
GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs) GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms. We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z)
Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models [67.20964015591262]
We introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation task and a supporting dataset. The dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. We present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding.
arXiv Detail & Related papers (2023-07-24T07:40:59Z)
Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps? [3.0874448550989673]
Longitudinal Dialogues (LDs) are the most challenging type of conversation for human-machine dialogue systems. We study the task of response generation in LDs. We fine-tune two PLMs, GePpeTto and iT5, using a dataset of LDs.
arXiv Detail & Related papers (2023-05-25T10:13:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.