Related papers: EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

URL: http://arxiv.org/abs/2502.12486v1
Date: Tue, 18 Feb 2025 03:15:55 GMT
Title: EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
Authors: Xiaoqian Liu, Ke Wang, Yongbin Li, Yuchuan Wu, Wentao Ma, Aobo Kong, Fei Huang, Jianbin Jiao, Junge Zhang,
Abstract summary: We propose explicit policy optimization (EPO) for strategic reasoning.<n>EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.<n> Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
Score: 69.55982246413046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business negotiations, which require strategic reasoning-an ability to navigate dynamic environments and align long-term goals amidst uncertainty. Existing methods for strategic reasoning face challenges in adaptability, scalability, and transferring strategies to new contexts. To address these issues, we propose explicit policy optimization (EPO) for strategic reasoning, featuring an LLM that provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. To improve adaptability and policy transferability, we train the strategic reasoning model via multi-turn reinforcement learning (RL) using process rewards and iterative self-play, without supervised fine-tuning (SFT) as a preliminary step. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment through enhanced strategic reasoning, achieving state-of-the-art performance on social dialogue and web navigation tasks. Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies, underscoring its potential for strategic reasoning in real-world applications.

Related papers

Expanding LLM Agent Boundaries with Strategy-Guided Exploration [51.98616048282804]
Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding.<n>We propose Strategy-Guided Exploration (SGE) to shift exploration from low-level actions to higher-level language strategies.
arXiv Detail & Related papers (2026-03-02T16:28:39Z)
Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts [63.412646471177645]
We propose a novel Reinforced Strategy Optimization (RSO) method for Conversational Recommender Systems (CRSs)<n>RSO decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation.<n>Experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-30T11:12:01Z)
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models [28.28739884703072]
This paper introduces WGSR-Bench, the first strategy reasoning benchmark for Large Language Models (LLMs) using wargame as its evaluation environment.<n>We design test samples around three core tasks, i.e., Environmental situation awareness, Opponent risk modeling and Policy generation, to systematically assess main abilities of strategic reasoning.
arXiv Detail & Related papers (2025-06-12T01:16:34Z)
Strategy-Augmented Planning for Large Language Models via Opponent Exploitation [11.840105106884543]
We introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents.<n>In the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the Strategy Evaluation Network (SEN)<n>During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN.
arXiv Detail & Related papers (2025-05-13T11:41:10Z)
Reinforcement Learning Environment with LLM-Controlled Adversary in D&D 5th Edition Combat [0.0]
This research employs Deep Q-Networks (DQN) for the smaller agents, creating a testbed for strategic AI development. We successfully integrated sophisticated language models into the RL framework, enhancing strategic decision-making processes.
arXiv Detail & Related papers (2025-03-19T22:48:20Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making [43.734386326024016]
Large Language Models (LLMs) have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities. This paper presents a novel framework equipped with memory and specialized tools to enhance their strategic decision-making capabilities.
arXiv Detail & Related papers (2024-05-25T23:25:10Z)
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models. It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation [69.5677514160986]
We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users. This poses two main challenges for existing dialogue agents. We propose Trip to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm.
arXiv Detail & Related papers (2024-03-11T14:38:16Z)
K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning [76.3114831562989]
It requires Large Language Model (LLM) agents to adapt their strategies dynamically in multi-agent environments. We propose a novel framework: "K-Level Reasoning with Large Language Models (K-R)"
arXiv Detail & Related papers (2024-02-02T16:07:05Z)
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data. PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z)
Strategic Reasoning with Language Models [35.63300060111918]
Strategic reasoning enables agents to cooperate, communicate, and compete with other agents in diverse situations. Existing approaches to solving strategic games rely on extensive training, yielding strategies that do not generalize to new scenarios or games without retraining. This paper introduces an approach that uses pretrained Large Language Models with few-shot chain-of-thought examples to enable strategic reasoning for AI agents.
arXiv Detail & Related papers (2023-05-30T16:09:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.