Related papers: Enhancing the Preference Extractor in Multi-turn Dialogues: From Annotating Disasters to Accurate Preference Extraction

Enhancing the Preference Extractor in Multi-turn Dialogues: From Annotating Disasters to Accurate Preference Extraction

URL: http://arxiv.org/abs/2508.01739v1
Date: Sun, 03 Aug 2025 12:44:03 GMT
Title: Enhancing the Preference Extractor in Multi-turn Dialogues: From Annotating Disasters to Accurate Preference Extraction
Authors: Cheng Wang, ziru Liu, Pengcheng Tang, Mingyu Zhang, Quanyu Dai, Yue Zhu,
Abstract summary: We propose a novel dialogue data generation framework named textbfIterChat.<n>First, we construct a new data format that categorizes the dialogue data into attributed historical preferences and one-turn dialogues.<n>This reduces the probability of annotation errors and improves annotation efficiency.
Score: 11.102491100383254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying user preferences in dialogue systems is a pivotal aspect of providing satisfying services. Current research shows that using large language models (LLMs) to fine-tune a task-specific preference extractor yields excellent results in terms of accuracy and generalization. However, the primary challenge stems from the inherent difficulty in obtaining high-quality labeled multi-turn dialogue data. Accurately tracking user preference transitions across turns not only demands intensive domain expertise and contextual consistency maintenance for annotators (termed \textbf{``Annotating Disaster''}) but also complicates model training due to error propagation in sequential dependency learning. Inspired by the observation that multi-turn preference extraction can be decomposed into iterative executions of one-turn extraction processes. We propose a novel dialogue data generation framework named \textbf{IterChat}. First, we construct a new data format that categorizes the dialogue data into attributed historical preferences and one-turn dialogues. This reduces the probability of annotation errors and improves annotation efficiency. Then, to generate a high-quality and diverse dialogue dataset, we adopt GPT4 to pre-define the preference slots in the target preference extractor task and then randomly sample the subset of the slots and their corresponding schema values to create the dialogue datasets. Experimental results indicate that fine-tuning or only few-shot prompting with the new dialogue format yields superior performance compared to the original multi-turn dialogues. Additionally, the new data format improves annotator efficiency with a win rate of 28.4\% higher than the original multi-turn dialogues.

Related papers

Attribute Controlled Dialogue Prompting [31.09791656949115]
We present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.
arXiv Detail & Related papers (2023-07-11T12:48:55Z)
Multi-grained Hypergraph Interest Modeling for Conversational Recommendation [75.65483522949857]
We propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data. In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS.
arXiv Detail & Related papers (2023-05-04T13:13:44Z)
DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization [127.714919036388]
DIONYSUS is a pre-trained encoder-decoder model for summarizing dialogues in any new domain. Our experiments show that DIONYSUS outperforms existing methods on six datasets.
arXiv Detail & Related papers (2022-12-20T06:21:21Z)
Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues [34.78482218571574]
We propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.
arXiv Detail & Related papers (2022-10-30T13:26:49Z)
Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning. Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement. Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z)
A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation [107.82729587882397]
It is expensive to scale up current persona-based dialogue datasets. Each data sample in this task is more complex to learn with than conventional dialogue data. We propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model.
arXiv Detail & Related papers (2022-04-21T03:49:54Z)
Dialogue History Matters! Personalized Response Selectionin Multi-turn Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching. Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information. We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z)
Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences. We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z)
Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data. A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data. A ranking module is employed to filter out low-quality dialogues. A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.