Related papers: Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

URL: http://arxiv.org/abs/2502.04686v1
Date: Fri, 07 Feb 2025 06:19:55 GMT
Title: Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization
Authors: Zelai Xu, Wanjun Gu, Chao Yu, Yi Wu, Yu Wang,
Abstract summary: Large language model (LLM)-based agents have recently shown impressive progress in a variety of domains.<n>Applying these agents to social deduction games such as Werewolf, which requires both strategic decision-making and free-form language interaction, remains non-trivial.<n>We propose Latent Space Policy Optimization (LSPO), an iterative framework that addresses these challenges by first mapping free-form text to a discrete latent space.
Score: 13.496120603859701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM)-based agents have recently shown impressive progress in a variety of domains, including open-ended conversation and multi-step decision-making. However, applying these agents to social deduction games such as Werewolf, which requires both strategic decision-making and free-form language interaction, remains non-trivial. Traditional methods based on Counterfactual Regret Minimization (CFR) or reinforcement learning (RL) typically depend on a predefined action space, making them unsuitable for language games with unconstrained text action space. Meanwhile, pure LLM-based agents often suffer from intrinsic biases and require prohibitively large datasets for fine-tuning. We propose Latent Space Policy Optimization (LSPO), an iterative framework that addresses these challenges by first mapping free-form text to a discrete latent space, where methods like CFR and RL can learn strategic policy more effectively. We then translate the learned policy back into natural language dialogues, which are used to fine-tune an LLM via Direct Preference Optimization (DPO). By iteratively alternating between these stages, our LSPO agent progressively enhances both strategic reasoning and language communication. Experiment results on the Werewolf game show that our method improves the agent's performance in each iteration and outperforms existing Werewolf agents, underscoring its promise for free-form language decision-making.

Related papers

Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning [69.5875073447454]
This paper advances motion agents empowered by large language models (LLMs) toward autonomous navigation in dynamic and cluttered environments.<n>Our training-free framework supports multi-agent coordination, closed-loop replanning, and dynamic obstacle avoidance without retraining or fine-tuning.
arXiv Detail & Related papers (2025-03-10T13:39:09Z)
Simulation of Language Evolution under Regulated Social Media Platforms: A Synergistic Approach of Large Language Models and Genetic Algorithms [6.550725258692423]
Social media platforms frequently impose restrictive policies to moderate user content, prompting the emergence of creative evasion language strategies. This paper presents a multi-agent framework based on Large Language Models (LLMs) to simulate the iterative evolution of language strategies under regulatory constraints.
arXiv Detail & Related papers (2025-02-26T14:59:27Z)
Policy Learning with a Natural Language Action Space: A Causal Approach [24.096991077437146]
This paper introduces a novel causal framework for multi-stage decision-making in natural language action spaces. Our approach employs Q-learning to estimate Dynamic Treatment Regimes (DTR) through a single model. A key technical contribution of our approach is a decoding strategy that translates optimized embeddings back into coherent natural language.
arXiv Detail & Related papers (2025-02-24T17:26:07Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>We train the strategic reasoning model via multi-turn reinforcement learning (RL),utilizing process rewards and iterative self-play.<n>Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making [43.734386326024016]
Large Language Models (LLMs) have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. This paper presents a novel framework equipped with memory and specialized tools to enhance their strategic decision-making capabilities.
arXiv Detail & Related papers (2024-05-25T23:25:10Z)
Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation [6.550725258692423]
Social media platforms such as Twitter, Reddit, and Sina Weibo play a crucial role in global communication but often encounter strict regulations in geopolitically sensitive regions. This paper proposes a multi-agent simulation framework using Large Language Models (LLMs) to explore the evolution of user language in regulated social media environments.
arXiv Detail & Related papers (2024-05-05T09:02:54Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Steering Language Models with Game-Theoretic Solvers [43.023261136434876]
We introduce a framework that allows equilibrium solvers to work over the space of natural language dialogue generated by large language models (LLMs)<n>Specifically, by modelling the players, strategies and payoffs in a "game" of dialogue, we create a binding from natural language interactions to the conventional symbolic logic of game theory.<n>We focus on three domains that require different negotiation strategies: scheduling meetings, trading fruit and debate, and evaluate an LLM's generated language when guided by solvers.
arXiv Detail & Related papers (2024-01-24T22:22:00Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research. Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z)
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy'' We develop DEEP to evaluate LLMs' expression and disguising abilities. We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z)
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game [40.438765131992525]
We develop strategic language agents that generate flexible language actions and possess strong decision-making abilities. To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates. Experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game.
arXiv Detail & Related papers (2023-10-29T09:02:57Z)
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf [19.39740531672788]
We propose a tuning-free framework to engage large language models in communication games. An empirical study on the representative and widely-studied communication game, Werewolf'', demonstrates that our framework can effectively play Werewolf game without tuning the parameters of the LLMs.
arXiv Detail & Related papers (2023-09-09T01:56:40Z)
Strategic Reasoning with Language Models [35.63300060111918]
Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations. Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining. This paper introduces an approach that uses pretrained Large Language Models with few-shot chain-of-thought examples to enable strategic reasoning for AI agents.
arXiv Detail & Related papers (2023-05-30T16:09:19Z)
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management [36.254564021059515]
Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic. We develop a variety of RL algorithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert Language Models (MoE-LMs) By exploiting MoE-LM structure, our methods significantly reduce the size of the action space and improve the efficacy of RL-based DM.
arXiv Detail & Related papers (2023-02-21T18:02:20Z)
Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning. Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.