LLM-Driven Preference Data Synthesis for Proactive Prediction of the Next User Utterance in Human-Machine Dialogue
- URL: http://arxiv.org/abs/2601.09713v1
- Date: Wed, 24 Dec 2025 12:23:52 GMT
- Title: LLM-Driven Preference Data Synthesis for Proactive Prediction of the Next User Utterance in Human-Machine Dialogue
- Authors: Jinqiang Wang, Huansheng Ning, Jianguo Ding, Tao Zhu, Liming Chen, Chris Nugent,
- Abstract summary: ProUtt is an LLM-driven preference data synthesis method for proactive next utterance prediction.<n>It converts dialogue history into an intent tree and explicitly models intent reasoning trajectories.<n>It then constructs preference and non-preference reasoning processes by perturbing or revising intent tree paths at different future turns.
- Score: 10.08256631711306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proactively predicting a users next utterance in human-machine dialogue can streamline interaction and improve user experience. Existing commercial API-based solutions are subject to privacy concerns while deploying general-purpose LLMs locally remains computationally expensive. As such, training a compact, task-specific LLM provides a practical alternative. Although user simulator methods can predict a user's next utterance, they mainly imitate their speaking style rather than advancing the dialogue. Preference data synthesis has been investigated to generate data for proactive next utterance prediction and help align LLMs with user preferences. Yet existing methods lack the ability to explicitly model the intent reasoning that leads to the user's next utterance and to define and synthesize preference and non-preference reasoning processes for predicting the user's next utterance.To address these challenges, we propose ProUtt, an LLM-driven preference data synthesis method for proactive next utterance prediction. ProUtt converts dialogue history into an intent tree and explicitly models intent reasoning trajectories by predicting the next plausible path from both exploitation and exploration perspectives. It then constructs preference and non-preference reasoning processes by perturbing or revising intent tree paths at different future turns. Extensive evaluations using LLM-as-a-judge and human judgments demonstrate that ProUtt consistently outperforms existing data synthesis methods, user simulators, and commercial LLM APIs across four benchmark datasets. We release both the code and the synthesized datasets to facilitate future research.
Related papers
- Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions [50.70965714314064]
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions.<n>This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions.
arXiv Detail & Related papers (2026-03-04T15:42:43Z) - Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - Investigating Thematic Patterns and User Preferences in LLM Interactions using BERTopic [4.087884819027264]
This study applies BERTopic to the lmsys-chat-1m dataset, a multilingual conversational corpus built from head-to-head evaluations of large language models (LLMs)<n>The main objective is uncovering thematic patterns in these conversations and examining their relation to user preferences.<n>We analysed relationships between topics and model preferences to identify trends in model-topic alignment.
arXiv Detail & Related papers (2025-10-08T21:13:44Z) - A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism [6.387945824899046]
In the tourism domain, Large Language Models (LLMs) often struggle to mine implicit user intentions from tourists' ambiguous inquiries.<n>We propose SynPT, which constructs an LLM-driven user agent and assistant agent to simulate dialogues based on seed data collected from Chinese tourism websites.
arXiv Detail & Related papers (2025-05-14T02:36:17Z) - Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback [50.84142264245052]
This work introduces the Align-SLM framework to enhance the semantic understanding of textless Spoken Language Models (SLMs)<n>Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO)<n>We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation.
arXiv Detail & Related papers (2024-11-04T06:07:53Z) - ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation [38.64175351885443]
Large language models have been flourishing in the natural language processing (NLP) domain.
Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns.
Existing works only fine-tune a sole LLM on given text data without introducing that important information to it.
arXiv Detail & Related papers (2024-06-27T01:37:57Z) - On Overcoming Miscalibrated Conversational Priors in LLM-based Chatbots [19.423566424346166]
We study the use of Large Language Model (LLM)-based chatbots to power recommender systems.
We observe that the chatbots respond poorly when they encounter under-specified requests.
We conjecture that such miscalibrated response tendencies can be attributed to LLM fine-tuning using annotators.
arXiv Detail & Related papers (2024-06-01T15:54:45Z) - Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems [0.0]
Reinforcement learning (RL) recommender systems often rely on static datasets that fail to capture the fluid, ever changing nature of user preferences in real-world scenarios.<n>We introduce Lusifer, an LLM-based simulation environment designed to generate dynamic, realistic user feedback for RL-based recommender training.
arXiv Detail & Related papers (2024-05-22T05:43:15Z) - Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis [51.04181562775778]
We present a novel approach to automatically synthesize "wayfinding instructions" for an embodied robot agent.
Our algorithm uses in-context learning to condition an LLM to generate instructions using just a few references.
We implement our approach on multiple simulation platforms including Matterport3D, AI Habitat and ThreeDWorld.
arXiv Detail & Related papers (2024-03-18T05:38:07Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Unlocking the Potential of User Feedback: Leveraging Large Language
Model as User Simulator to Enhance Dialogue System [65.93577256431125]
We propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller task-oriented dialogue model.
This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models.
Our approach outperforms previous state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-06-16T13:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.