Related papers: WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench

WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench

URL: http://arxiv.org/abs/2506.12841v1
Date: Sun, 15 Jun 2025 13:28:41 GMT
Title: WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench
Authors: Xinyuan Xia, Yuanyi Song, Haomin Ma, Jinyu Cai,
Abstract summary: We propose WereWolf-Plus, a multi-model, multi-dimensional, and multi-method benchmarking platform for evaluating multi-agent strategic reasoning.<n>The platform supports customizable configurations for roles such as Seer, Witch, Hunter, Guard, and Sheriff, along with flexible model assignment and reasoning enhancement strategies.<n>We introduce a comprehensive set of quantitative evaluation metrics for all special roles, werewolves, and the sheriff, and enrich the assessment dimensions for agent reasoning ability, cooperation capacity, and social influence.
Score: 3.3998740964877463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid development of LLM-based agents, increasing attention has been given to their social interaction and strategic reasoning capabilities. However, existing Werewolf-based benchmarking platforms suffer from overly simplified game settings, incomplete evaluation metrics, and poor scalability. To address these limitations, we propose WereWolf-Plus, a multi-model, multi-dimensional, and multi-method benchmarking platform for evaluating multi-agent strategic reasoning in the Werewolf game. The platform offers strong extensibility, supporting customizable configurations for roles such as Seer, Witch, Hunter, Guard, and Sheriff, along with flexible model assignment and reasoning enhancement strategies for different roles. In addition, we introduce a comprehensive set of quantitative evaluation metrics for all special roles, werewolves, and the sheriff, and enrich the assessment dimensions for agent reasoning ability, cooperation capacity, and social influence. WereWolf-Plus provides a more flexible and reliable environment for advancing research on inference and strategic interaction within multi-agent communities. Our code is open sourced at https://github.com/MinstrelsyXia/WereWolfPlus.

Related papers

Werewolf: A Straightforward Game Framework with TTS for Improved User Engagement [42.620240788389154]
We propose a novel yet straightforward LLM-based Werewolf game system with tuned Text-to-Speech(TTS) models.<n>We argue with ever enhancing LLM reasoning, extra components will be unnecessary in the case of Werewolf.
arXiv Detail & Related papers (2025-05-30T18:58:57Z)
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation [78.96590724864606]
We introduce the Knowledge Orthogonal Reasoning Gymnasium (KORGym), a dynamic evaluation platform inspired by KOR-Bench and Gymnasium.<n>KORGym offers over fifty games in either textual or visual formats and supports interactive, multi-turn assessments with reinforcement learning scenarios.
arXiv Detail & Related papers (2025-05-20T16:06:32Z)
Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization [13.496120603859701]
We propose Latent Space Policy Optimization (LSPO), an iterative framework that combines game-theoretic methods with fine-tuning to build strategic language agents.<n>Experiment on the Werewolf game shows that our agents iteratively expand the strategy space with improving performance and outperform existing Werewolf agents.
arXiv Detail & Related papers (2025-02-07T06:19:55Z)
MageBench: Bridging Large Multimodal Models to Agents [90.59091431806793]
LMMs have shown impressive visual understanding capabilities, with the potential to be applied in agents.<n>Existing benchmarks mostly assess their reasoning abilities in language part.<n>MageBench is a reasoning capability oriented multimodal agent benchmark.
arXiv Detail & Related papers (2024-12-05T17:08:19Z)
SocialGFs: Learning Social Gradient Fields for Multi-Agent Reinforcement Learning [58.84311336011451]
We propose a novel gradient-based state representation for multi-agent reinforcement learning. We employ denoising score matching to learn the social gradient fields (SocialGFs) from offline samples. In practice, we integrate SocialGFs into the widely used multi-agent reinforcement learning algorithms, e.g., MAPPO.
arXiv Detail & Related papers (2024-05-03T04:12:19Z)
Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game [1.4565642534804486]
We employ the Werewolf game as a simulation platform to assess the opinion leadership of large language models (LLMs) The game includes the role of the Sheriff, tasked with summarizing arguments and recommending decision options. We devise two novel metrics based on the critical characteristics of opinion leaders.
arXiv Detail & Related papers (2024-04-02T02:46:18Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy'' We develop DEEP to evaluate LLMs' expression and disguising abilities. We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z)
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game [37.69298376616128]
We develop strategic language agents that generate flexible language actions and possess strong decision-making abilities.<n>To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.<n>Experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game.
arXiv Detail & Related papers (2023-10-29T09:02:57Z)
A Novel Weighted Ensemble Learning Based Agent for the Werewolf Game [0.0]
Werewolf is a popular party game throughout the world, and research on its significance has progressed in recent years. In this research, we generated a sophisticated agent to play the Werewolf game using a complex weighted ensemble learning approach.
arXiv Detail & Related papers (2022-05-19T19:19:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.