Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game
- URL: http://arxiv.org/abs/2404.01602v2
- Date: Thu, 29 Aug 2024 08:49:14 GMT
- Title: Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game
- Authors: Silin Du, Xiaowei Zhang,
- Abstract summary: We employ the Werewolf game as a simulation platform to assess the opinion leadership of large language models (LLMs)
The game includes the role of the Sheriff, tasked with summarizing arguments and recommending decision options.
We devise two novel metrics based on the critical characteristics of opinion leaders.
- Score: 1.4565642534804486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have exhibited memorable strategic behaviors in social deductive games. However, the significance of opinion leadership exhibited by LLM-based agents has been largely overlooked, which is crucial for practical applications in multi-agent and human-AI interaction settings. Opinion leaders are individuals who have a noticeable impact on the beliefs and behaviors of others within a social group. In this work, we employ the Werewolf game as a simulation platform to assess the opinion leadership of LLMs. The game includes the role of the Sheriff, tasked with summarizing arguments and recommending decision options, and therefore serves as a credible proxy for an opinion leader. We develop a framework integrating the Sheriff role and devise two novel metrics based on the critical characteristics of opinion leaders. The first metric measures the reliability of the opinion leader, and the second assesses the influence of the opinion leader on other players' decisions. We conduct extensive experiments to evaluate LLMs of different scales. In addition, we collect a Werewolf question-answering dataset (WWQA) to assess and enhance LLM's grasp of the game rules, and we also incorporate human participants for further analysis. The results suggest that the Werewolf game is a suitable test bed to evaluate the opinion leadership of LLMs, and few LLMs possess the capacity for opinion leadership.
Related papers
- Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction [3.350801757799469]
Werewolf Arena is a framework for evaluating large language models (LLMs)
In Werewolf Arena, LLMs compete against each other, navigating the game's complex dynamics of deception, deduction, and persuasion.
We demonstrate Werewolf Arena's utility through an arena-style tournament featuring Gemini and GPT models.
arXiv Detail & Related papers (2024-07-18T23:41:05Z) - Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? [0.1474723404975345]
We study the cooperative behavior of Llama2 when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility.
We find that Llama2 tends not to initiate defection but it adopts a cautious approach towards cooperation.
In comparison to prior research on human participants, Llama2 exhibits a greater inclination towards cooperative behavior.
arXiv Detail & Related papers (2024-06-19T14:51:14Z) - Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations [1.6108153271585284]
We show that large language models (LLMs) behave differently compared to humans in high-stakes military decision-making scenarios.
Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
arXiv Detail & Related papers (2024-03-06T02:23:32Z) - GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications.
This paper evaluates LLMs' reasoning abilities in competitive environments.
We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z) - Enhance Reasoning for Large Language Models in the Game Werewolf [15.730860371636336]
This paper presents an innovative framework that integrates Large Language Models (LLMs) with an external Thinker module.
Our framework is presented using a 9-player Werewolf game that demands dual-system reasoning.
Experiments demonstrate the framework's effectiveness in deductive reasoning, speech generation, and online game evaluation.
arXiv Detail & Related papers (2024-02-04T03:47:10Z) - Leveraging Word Guessing Games to Assess the Intelligence of Large
Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy''
We develop DEEP to evaluate LLMs' expression and disguising abilities.
We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z) - Language Agents with Reinforcement Learning for Strategic Play in the
Werewolf Game [40.438765131992525]
We develop strategic language agents that generate flexible language actions and possess strong decision-making abilities.
To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.
Experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game.
arXiv Detail & Related papers (2023-10-29T09:02:57Z) - LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay [55.12945794835791]
Using Avalon as a testbed, we employ system prompts to guide LLM agents in gameplay.
We propose a novel framework, tailored for Avalon, features a multi-agent system facilitating efficient communication and interaction.
Results affirm the framework's effectiveness in creating adaptive agents and suggest LLM-based agents' potential in navigating dynamic social interactions.
arXiv Detail & Related papers (2023-10-23T14:35:26Z) - GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs)
GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms.
We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - In-Context Impersonation Reveals Large Language Models' Strengths and
Biases [56.61129643802483]
We ask LLMs to assume different personas before solving vision and language tasks.
We find that LLMs pretending to be children of different ages recover human-like developmental stages.
In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts.
arXiv Detail & Related papers (2023-05-24T09:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.