Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter
- URL: http://arxiv.org/abs/2503.05362v1
- Date: Fri, 07 Mar 2025 12:07:59 GMT
- Title: Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter
- Authors: Weixiang Zhao, Xingyu Sui, Xinyang Han, Yang Deng, Yulin Hu, Jiahe Guo, Libo Qin, Qianyun Du, Shijin Wang, Yanyan Zhao, Bing Qin, Ting Liu,
- Abstract summary: We propose Chain-of-Strategy Optimization (CSO), a novel approach that optimize strategy selection preferences at each dialogue turn.<n>We first leverage Monte Carlo Tree Search to construct ESC-Pro, a high-quality preference dataset with turn-level strategy-response pairs.<n>Training on ESC-Pro with CSO improves both strategy accuracy and bias mitigation, enabling LLMs to generate more empathetic and contextually appropriate responses.
- Score: 44.17098675825127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing emotional stress in modern society has increased the demand for Emotional Support Conversations (ESC). While Large Language Models (LLMs) show promise for ESC, they face two key challenges: (1) low strategy selection accuracy, and (2) preference bias, limiting their adaptability to emotional needs of users. Existing supervised fine-tuning (SFT) struggles to address these issues, as it rigidly trains models on single gold-standard responses without modeling nuanced strategy trade-offs. To overcome these limitations, we propose Chain-of-Strategy Optimization (CSO), a novel approach that optimizes strategy selection preferences at each dialogue turn. We first leverage Monte Carlo Tree Search to construct ESC-Pro, a high-quality preference dataset with turn-level strategy-response pairs. Training on ESC-Pro with CSO improves both strategy accuracy and bias mitigation, enabling LLMs to generate more empathetic and contextually appropriate responses. Experiments on LLaMA-3.1-8B, Gemma-2-9B, and Qwen2.5-7B demonstrate that CSO outperforms standard SFT, highlighting the efficacy of fine-grained, turn-level preference modeling in ESC.
Related papers
- Emotional Support with LLM-based Empathetic Dialogue Generation [5.289702620838033]
This paper presents our solution for the NLPCC 2025 Task 8 ESC evaluation.<n>We leverage large-scale language models enhanced by prompt engineering and finetuning techniques.
arXiv Detail & Related papers (2025-07-17T06:24:20Z) - The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs [66.17068546293487]
Large vision-language models (VLMs) increasingly adopt post-training techniques such as long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL) to elicit sophisticated reasoning.<n>We present a systematic investigation into the distinct roles and interplay of long-CoT SFT and RL across multiple multimodal reasoning benchmarks.<n>We find that SFT improves performance on difficult questions by in-depth, structured reasoning, but introduces verbosity and degrades performance on simpler ones.
arXiv Detail & Related papers (2025-07-10T09:05:49Z) - GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark with complex real-world videos requiring balanced perception and reasoning.<n>Using SEED-Bench-R1, we find that standard GRPO, while improving answer accuracy, often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate.<n>We propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision.
arXiv Detail & Related papers (2025-06-19T08:49:13Z) - Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning [20.515599491717442]
We introduce textbfMetis-RISE (textbfRL textbfSFT textbfEnhances) for multimodal reasoning model learning.
arXiv Detail & Related papers (2025-06-16T02:56:13Z) - DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization [35.50223358356217]
We propose a Decoupled ESC framework inspired by Gross's Extended Process Model of Emotion Regulation.<n>Our framework outperforms joint optimization baselines, reducing preference bias and improving response quality.
arXiv Detail & Related papers (2025-05-22T17:56:21Z) - Convert Language Model into a Value-based Strategic Planner [11.070654717643816]
Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations.<n>We propose a framework called straQ* to define the diagram from the state model perspective.<n>Our framework allows a plug-and-play LLM to bootstrap the planning during ESC, determine the optimal strategy based on long-term returns, and finally guide the LLM to response.
arXiv Detail & Related papers (2025-05-11T14:13:58Z) - MetaScale: Test-Time Scaling with Evolving Meta-Thoughts [51.35594569020857]
Experimental results demonstrate that MetaScale consistently outperforms standard inference approaches.
METASCALE scales more effectively with increasing sampling budgets and produces more structured, expert-level responses.
arXiv Detail & Related papers (2025-03-17T17:59:54Z) - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.<n> Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z) - A Unified Approach to Routing and Cascading for LLMs [5.653106385738822]
Large language models (LLMs) embedded in various agentic systems have increased the potential of model selection strategies to improve the cost-performance tradeoff.
Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found.
We derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy.
We propose cascade routing, a unified framework that integrates routing and cascading into a theoretically optimal strategy.
arXiv Detail & Related papers (2024-10-14T10:00:49Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics [12.105216351739422]
EmoDynamiX models the discourse dynamics between user fine-grained emotions and system strategies using a heterogeneous graph for better performance and transparency.<n> Experimental results on two ESC datasets show EmoDynamiX outperforms previous state-of-the-art methods with a significant margin.
arXiv Detail & Related papers (2024-08-16T14:54:41Z) - ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models [55.301188787490545]
Emotion Support Conversation (ESC) aims to reduce human stress, offer emotional guidance, and enhance human mental and physical well-being.
We propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models.
We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models.
arXiv Detail & Related papers (2024-06-21T08:03:33Z) - iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers [11.819814280565142]
This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense.
The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities.
We propose a unique strategy to improve the performance of pre-trained language models in both subtasks.
arXiv Detail & Related papers (2024-05-25T08:50:51Z) - RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning [8.389454219309837]
multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations.
We propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent.
With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm.
arXiv Detail & Related papers (2024-04-12T05:02:49Z) - K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning [76.3114831562989]
It requires Large Language Model (LLM) agents to adapt their strategies dynamically in multi-agent environments.
We propose a novel framework: "K-Level Reasoning with Large Language Models (K-R)"
arXiv Detail & Related papers (2024-02-02T16:07:05Z) - Improving Multi-turn Emotional Support Dialogue Generation with
Lookahead Strategy Planning [81.79431311952656]
We propose a novel system MultiESC to provide Emotional Support.
For strategy planning, we propose lookaheads to estimate the future user feedback after using particular strategies.
For user state modeling, MultiESC focuses on capturing users' subtle emotional expressions and understanding their emotion causes.
arXiv Detail & Related papers (2022-10-09T12:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.