Long-Horizon Dialogue Understanding for Role Identification in the Game
of Avalon with Large Language Models
- URL: http://arxiv.org/abs/2311.05720v1
- Date: Thu, 9 Nov 2023 20:04:08 GMT
- Title: Long-Horizon Dialogue Understanding for Role Identification in the Game
of Avalon with Large Language Models
- Authors: Simon Stepputtis, Joseph Campbell, Yaqi Xie, Zhengyang Qi, Wenxin
Sharon Zhang, Ruiyi Wang, Sanketh Rangreji, Michael Lewis, Katia Sycara
- Abstract summary: We explore the game of Avalon: The Resistance, a social deduction game in which players must determine each other's hidden identities to complete their team's objective.
We introduce an online testbed and a dataset containing 20 carefully collected and labeled games that exhibit long-horizon deception in a cooperative-competitive setting.
We discuss the capabilities of LLMs to utilize deceptive long-horizon conversations between six human players to determine each player's goal and motivation.
- Score: 6.176709034158014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deception and persuasion play a critical role in long-horizon dialogues
between multiple parties, especially when the interests, goals, and motivations
of the participants are not aligned. Such complex tasks pose challenges for
current Large Language Models (LLM) as deception and persuasion can easily
mislead them, especially in long-horizon multi-party dialogues. To this end, we
explore the game of Avalon: The Resistance, a social deduction game in which
players must determine each other's hidden identities to complete their team's
objective. We introduce an online testbed and a dataset containing 20 carefully
collected and labeled games among human players that exhibit long-horizon
deception in a cooperative-competitive setting. We discuss the capabilities of
LLMs to utilize deceptive long-horizon conversations between six human players
to determine each player's goal and motivation. Particularly, we discuss the
multimodal integration of the chat between the players and the game's state
that grounds the conversation, providing further insights into the true player
identities. We find that even current state-of-the-art LLMs do not reach human
performance, making our dataset a compelling benchmark to investigate the
decision-making and language-processing capabilities of LLMs. Our dataset and
online testbed can be found at our project website:
https://sstepput.github.io/Avalon-NLU/
Related papers
- AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game [12.384945632524424]
This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior.
Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context.
arXiv Detail & Related papers (2024-07-23T14:34:38Z) - Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft [14.877848057734463]
We design a minigame within Minecraft where a player works with two GPT4-driven NPCs to complete a quest.
On analyzing the game logs and recordings, we find that several patterns of collaborative behavior emerge from the NPCs and the human players.
We believe that this preliminary study and analysis will inform future game developers on how to better exploit these rapidly improving generative AI models for collaborative roles in games.
arXiv Detail & Related papers (2024-07-03T19:11:21Z) - A Dialogue Game for Eliciting Balanced Collaboration [64.61707514432533]
We present a two-player 2D object placement game in which the players must negotiate the goal state themselves.
We show empirically that human players exhibit a variety of role distributions, and that balanced collaboration improves task performance.
arXiv Detail & Related papers (2024-06-12T13:35:10Z) - Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues.
We equip each agent with the capability of sharing and reacting to images.
The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z) - Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks.
However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome.
In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z) - BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting.
We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance.
We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z) - GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs)
GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms.
We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z) - Tachikuma: Understading Complex Interactions with Multi-Character and
Novel Objects by Large Language Models [67.20964015591262]
We introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation task and a supporting dataset.
The dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations.
We present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding.
arXiv Detail & Related papers (2023-07-24T07:40:59Z) - Response Generation in Longitudinal Dialogues: Which Knowledge
Representation Helps? [3.0874448550989673]
Longitudinal Dialogues (LDs) are the most challenging type of conversation for human-machine dialogue systems.
We study the task of response generation in LDs.
We fine-tune two PLMs, GePpeTto and iT5, using a dataset of LDs.
arXiv Detail & Related papers (2023-05-25T10:13:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.