Related papers: Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance

Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance

URL: http://arxiv.org/abs/2505.12334v1
Date: Sun, 18 May 2025 09:59:22 GMT
Title: Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance
Authors: Yufeng Wang, Jinwu Hu, Ziteng Huang, Kunyang Lin, Zitian Zhang, Peihao Chen, Yu Hu, Qianyue Wang, Zhuliang Yu, Bin Sun, Xiaofen Xing, Qingfang Zheng, Mingkui Tan,
Abstract summary: Open-domain dialogue systems aim to generate natural and engaging conversations.<n>Existing large language models (LLMs) fall short in proactively understanding the user's chatting preferences.<n>We propose a User-oriented Proactive (UPC) to enhance the user-oriented proactivity.
Score: 35.15965694815852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-domain dialogue systems aim to generate natural and engaging conversations, providing significant practical value in real applications such as social robotics and personal assistants. The advent of large language models (LLMs) has greatly advanced this field by improving context understanding and conversational fluency. However, existing LLM-based dialogue systems often fall short in proactively understanding the user's chatting preferences and guiding conversations toward user-centered topics. This lack of user-oriented proactivity can lead users to feel unappreciated, reducing their satisfaction and willingness to continue the conversation in human-computer interactions. To address this issue, we propose a User-oriented Proactive Chatbot (UPC) to enhance the user-oriented proactivity. Specifically, we first construct a critic to evaluate this proactivity inspired by the LLM-as-a-judge strategy. Given the scarcity of high-quality training data, we then employ the critic to guide dialogues between the chatbot and user agents, generating a corpus with enhanced user-oriented proactivity. To ensure the diversity of the user backgrounds, we introduce the ISCO-800, a diverse user background dataset for constructing user agents. Moreover, considering the communication difficulty varies among users, we propose an iterative curriculum learning method that trains the chatbot from easy-to-communicate users to more challenging ones, thereby gradually enhancing its performance. Experiments demonstrate that our proposed training method is applicable to different LLMs, improving user-oriented proactivity and attractiveness in open-domain dialogues.

Related papers

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal [58.43749783815486]
We study implicit user feedback in two user-LM interaction datasets.<n>We find that the contents of user feedback can improve model performance in short human-designed questions.<n>We also find that the usefulness of user feedback is largely tied to the quality of the user's initial prompt.
arXiv Detail & Related papers (2025-07-30T23:33:29Z)
Enhancing User Engagement in Socially-Driven Dialogue through Interactive LLM Alignments [36.632855175566746]
Enhancing user engagement through interactions plays an essential role in socially-driven dialogues.<n>We enable interactive LLMs to learn user engagement by leveraging signals from the future development of conversations.<n>Experiments conducted on two socially-driven dialogue scenarios demonstrate that our method effectively enhances user engagement in interactive LLMs.
arXiv Detail & Related papers (2025-06-26T17:26:17Z)
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward [11.495697919066341]
Policy agents must be able to personalize their behavior to suit a user's preferences, personality, and attributes.<n>Current training methods like Reinforcement Learning from Human Feedback (RLHF) prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized interactions.<n>We propose to incorporate an intrinsic motivation to improve the conversational agents's model of the user as an additional reward alongside multi-turn RLHF.
arXiv Detail & Related papers (2025-04-04T06:35:02Z)
Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation [16.8514748768591]
This paper investigates aspects in which user queries fall short of expressing information needs, and the potential of using LLMs to rewrite suboptimal user prompts.<n>Our findings demonstrate that rephrasing ineffective prompts can elicit better responses from a conversational system, while preserving the user's original intent.
arXiv Detail & Related papers (2025-03-21T02:01:02Z)
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering [39.79275025010785]
name is a benchmark designed to evaluate the effectiveness of user embeddings in prompting large language models for personalization.<n>We conduct extensive experiments on various state-of-the-art methods for modeling user embeddings.
arXiv Detail & Related papers (2025-02-26T14:34:00Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z)
Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models [49.74265453289855]
Large language models (LLMs) are now accessible to anyone with a computer, a web browser, and an internet connection via browser-based interfaces. This paper examines the affordances of interactive feedback features in ChatGPT's interface, analysing how they shape user input and participation in iteration.
arXiv Detail & Related papers (2024-08-27T13:50:37Z)
LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy [83.61404191470126]
We propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors. The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
arXiv Detail & Related papers (2022-04-07T14:11:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.