Related papers: Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations

Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations

URL: http://arxiv.org/abs/2509.12661v1
Date: Tue, 16 Sep 2025 04:39:18 GMT
Title: Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations
Authors: Yougen Zhou, Qin Chen, Ningning Zhou, Jie Zhou, Xingjiao Wu, Liang He,
Abstract summary: Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue.<n>LLMs face persistent challenges in delivering effective ESC due to low accuracy in strategy planning.<n>We propose an approach to mitigate the bias by reinforcement learning with a dual reward function.
Score: 21.035567919734934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face persistent challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias in LLMs have not well been studied. To address these issues, we first reveal the fundamental causes of the bias by identifying the knowledge boundaries of LLMs in strategy planning. Then, we propose an approach to mitigate the bias by reinforcement learning with a dual reward function, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the knowledge boundaries. Experiments on the ESCov and ExTES datasets with multiple LLM backbones show that our approach outperforms the baselines, confirming the effectiveness of our approach.

Related papers

Verified Critical Step Optimization for LLM Agents [67.05296684575445]
Critical Step Optimization focuses preference learning on verified critical steps.<n>Method starts from failed policy trajectories rather than expert demonstrations.<n>Experiments on GAIA-Text-103 and XBench-DeepSearch show that CSO achieves 37% and 26% relative improvement over the SFT baseline.
arXiv Detail & Related papers (2026-02-03T11:41:02Z)
Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts [63.412646471177645]
We propose a novel Reinforced Strategy Optimization (RSO) method for Conversational Recommender Systems (CRSs)<n>RSO decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation.<n>Experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-30T11:12:01Z)
Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs [49.995906301946]
Existing methods usually leverage a fixed strategy to guide Large Language Models (LLMs) to perform mathematical reasoning.<n>Our analysis reveals that the single strategy cannot adapt to problem-specific requirements and thus overlooks the trade-off between effectiveness and efficiency.<n>We propose Planning and Routing through Instance-Specific Modeling (PRISM), a novel framework that decouples mathematical reasoning into two stages: strategy planning and targeted execution.
arXiv Detail & Related papers (2025-09-29T07:22:41Z)
Mitigating Strategy-Selection Bias in Reasoning for More Effective Test-Time Scaling [27.616118519120366]
Test-time scaling (TTS) has been shown to improve the performance of large language models (LLMs) by sampling and aggregating diverse reasoning paths.<n>We present a theoretical analysis that reveals when it undermines the effectiveness of test-time scaling.<n>Motivated by this theoretical insight, we introduce TTS-Uniform, a framework designed to mitigate the selection bias of reasoning strategies.
arXiv Detail & Related papers (2025-09-22T15:30:56Z)
Feedback-Induced Performance Decline in LLM-Based Decision-Making [6.5990946334144756]
Large Language Models (LLMs) can extract context from natural language problem descriptions.<n>This paper studies the behaviour of these models within a Markov Decision Process (MDPs)
arXiv Detail & Related papers (2025-07-20T10:38:56Z)
Strategy-Augmented Planning for Large Language Models via Opponent Exploitation [11.840105106884543]
We introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents.<n>In the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the Strategy Evaluation Network (SEN)<n>During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN.
arXiv Detail & Related papers (2025-05-13T11:41:10Z)
Cognitive Debiasing Large Language Models for Decision-Making [71.2409973056137]
Large language models (LLMs) have shown potential in supporting decision-making applications.<n>We propose a cognitive debiasing approach, self-adaptive cognitive debiasing (SACD)<n>Our method follows three sequential steps -- bias determination, bias analysis, and cognitive debiasing -- to iteratively mitigate potential cognitive biases in prompts.
arXiv Detail & Related papers (2025-04-05T11:23:05Z)
Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter [71.17800993332175]
We propose Chain-of-Strategy Optimization (CSO), a novel approach that optimize strategy selection preferences at each dialogue turn.<n>We first leverage Monte Carlo Tree Search to construct ESC-Pro, a high-quality preference dataset with turn-level strategy-response pairs.<n>Training on ESC-Pro with CSO improves both strategy accuracy and bias mitigation, enabling LLMs to generate more empathetic and contextually appropriate responses.
arXiv Detail & Related papers (2025-03-07T12:07:59Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>We train the strategic reasoning model via multi-turn reinforcement learning (RL),utilizing process rewards and iterative self-play.<n>Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models [14.057527352653787]
We propose a Knowledge-aware Preference Optimization strategy, dubbed KnowPO, aimed at achieving adaptive knowledge selection. We show that KnowPO outperforms previous methods for handling knowledge conflicts by over 37%.
arXiv Detail & Related papers (2024-08-06T16:55:54Z)
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models. It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Prompting Fairness: Integrating Causality to Debias Large Language Models [19.76215433424235]
Large language models (LLMs) are susceptible to generating biased and discriminatory responses.<n>We propose a causality-guided debiasing framework to tackle social biases.
arXiv Detail & Related papers (2024-03-13T17:46:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.