Convert Language Model into a Value-based Strategic Planner
- URL: http://arxiv.org/abs/2505.06987v4
- Date: Tue, 17 Jun 2025 15:43:40 GMT
- Title: Convert Language Model into a Value-based Strategic Planner
- Authors: Xiaoyu Wang, Yue Zhao, Qingqing Gu, Zhonglin Jiang, Xiaokai Chen, Yong Chen, Luo Ji,
- Abstract summary: Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations.<n>We propose a framework called straQ* to define the diagram from the state model perspective.<n>Our framework allows a plug-and-play LLM to bootstrap the planning during ESC, determine the optimal strategy based on long-term returns, and finally guide the LLM to response.
- Score: 11.070654717643816
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Q-learning on LLMs, and propose a framework called straQ*. Our framework allows a plug-and-play LLM to bootstrap the planning during ESC, determine the optimal strategy based on long-term returns, and finally guide the LLM to response. Substantial experiments on ESC datasets suggest that straQ* outperforms many baselines, including direct inference, self-refine, chain of thought, finetuning, and finite state machines.
Related papers
- ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection [4.101501114944147]
ReflecSched is a framework that empowers the LLM beyond a direct scheduler.<n>It distills simulations across multiple planning horizons into a concise, natural-language summary.<n>This summary is then integrated into the prompt of a final decision-making module, guiding it to produce non-myopic actions.
arXiv Detail & Related papers (2025-08-03T11:26:35Z) - Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z) - Enhancing Decision-Making of Large Language Models via Actor-Critic [28.870961806283425]
Large Language Models (LLMs) have achieved remarkable advancements in natural language processing tasks.<n>Existing methods either rely on short-term auto-regressive action generation or face limitations in accurately simulating rollouts and assessing outcomes.<n>This paper introduces a novel LLM-based Actor-Critic framework, termed LAC, that effectively improves LLM policies with long-term action evaluations.
arXiv Detail & Related papers (2025-06-04T14:58:27Z) - Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z) - FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations [11.718316719735832]
Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations.<n>We leverage the Finite State Machine (FSM) on large language models and propose a framework called FiSMiness.<n>Our framework allows a single LLM to bootstrap the planning during ESC, and self-reason the seeker's emotion, support strategy and the final response upon each conversational turn.
arXiv Detail & Related papers (2025-04-16T07:52:06Z) - Guiding Reasoning in Small Language Models with LLM Assistance [23.3038074903744]
Small Language Models cast doubt suitability for tasks demanding deep, multi-step logical deduction.<n>This paper introduces a framework called Small Reasons, Large Hints, which selectively augments SLM reasoning with targeted guidance from large language models.<n>Our experiments on mathematical reasoning datasets demonstrate that targeted external scaffolding significantly improves performance.
arXiv Detail & Related papers (2025-04-14T06:32:45Z) - Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning [16.654435148168172]
Large Language Models (LLMs) have shown remarkable promise in reasoning and decision-making.<n>We propose an LLM-guided hierarchical RL framework, termed LDSC, to enhance sample efficiency, generalization, and multi-task adaptability.
arXiv Detail & Related papers (2025-03-24T15:49:56Z) - Embodied CoT Distillation From LLM To Off-the-shelf Agents [6.318203525449058]
DeDer is a framework for decomposing and distilling the embodied reasoning capabilities from large language models (LLMs)<n>Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches.
arXiv Detail & Related papers (2024-12-16T07:18:02Z) - ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models [55.301188787490545]
Emotion Support Conversation (ESC) aims to reduce human stress, offer emotional guidance, and enhance human mental and physical well-being.
We propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models.
We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models.
arXiv Detail & Related papers (2024-06-21T08:03:33Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - AutoPlan: Automatic Planning of Interactive Decision-Making Tasks With
Large Language Models [11.895111124804503]
AutoPlan is an approach to guide LLM-based agents to accomplish interactive decision-making tasks.
Our experiments show that AutoPlan achieves success rates on par with the baselines.
arXiv Detail & Related papers (2023-05-24T11:52:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.