An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting
- URL: http://arxiv.org/abs/2409.07672v1
- Date: Thu, 12 Sep 2024 00:27:31 GMT
- Title: An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting
- Authors: Xia Hou, Qifeng Li, Tongliang Li,
- Abstract summary: This study proposes a novel unsupervised dialog topic segmentation method that combines the Utterance Rewriting (UR) technique with an unsupervised learning algorithm.
The proposed Discourse Rewriting Topic Model (UR-DTS) significantly improves the accuracy of topic segmentation.
- Score: 3.5399864027190366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue topic segmentation plays a crucial role in various types of dialogue modeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware discourse representations from conversation data through adjacent discourse matching and pseudo segmentation to further mine useful clues in unlabeled conversational relations. However, in multi-round dialogs, discourses often have co-references or omissions, leading to the fact that direct use of these discourses for representation learning may negatively affect the semantic similarity computation in the neighboring discourse matching task. In order to fully utilize the useful cues in conversational relations, this study proposes a novel unsupervised dialog topic segmentation method that combines the Utterance Rewriting (UR) technique with an unsupervised learning algorithm to efficiently utilize the useful cues in unlabeled dialogs by rewriting the dialogs in order to recover the co-referents and omitted words. Compared with existing unsupervised models, the proposed Discourse Rewriting Topic Segmentation Model (UR-DTS) significantly improves the accuracy of topic segmentation. The main finding is that the performance on DialSeg711 improves by about 6% in terms of absolute error score and WD, achieving 11.42% in terms of absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute error score and WD improves by about 3% and 2%, respectively, resulting in SOTA reaching 35.17% in terms of absolute error score and 38.49% in terms of WD. This shows that the model is very effective in capturing the nuances of conversational topics, as well as the usefulness and challenges of utilizing unlabeled conversations.
Related papers
- Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models [51.75805497456226]
This work focuses on the factual consistency issue with the help of the dialogue summarization task.
Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency.
To stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data.
arXiv Detail & Related papers (2023-11-13T09:32:12Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance
Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks.
We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z) - Dial2vec: Self-Guided Contrastive Learning of Unsupervised Dialogue
Embeddings [41.79937481022846]
We introduce the task of learning unsupervised dialogue embeddings.
Trivial approaches such as combining pre-trained word or sentence embeddings and encoding through pre-trained language models have been shown to be feasible.
We propose a self-guided contrastive learning approach named dial2vec.
arXiv Detail & Related papers (2022-10-27T11:14:06Z) - TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue
Modeling on Spoken Conversations [24.245354500835465]
We propose a novel model-agnostic data augmentation paradigm to boost the robustness of task-oriented dialogue modeling on spoken conversations.
Our approach ranked first in both tasks of DSTC10 Track2, a benchmark for task-oriented dialogue modeling on spoken conversations.
arXiv Detail & Related papers (2021-12-23T10:04:25Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - What Helps Transformers Recognize Conversational Structure? Importance
of Context, Punctuation, and Labels in Dialog Act Recognition [41.1669799542627]
We apply two pre-trained transformer models to structure a conversational transcript as a sequence of dialog acts.
We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes.
A detailed analysis reveals specific segmentation patterns observed in its absence.
arXiv Detail & Related papers (2021-07-05T21:56:00Z) - Contextual Dialogue Act Classification for Open-Domain Conversational
Agents [10.576497782941697]
Classifying the general intent of the user utterance in a conversation, also known as Dialogue Act (DA), is a key step in Natural Language Understanding (NLU) for conversational agents.
We propose CDAC (Contextual Dialogue Act), a simple yet effective deep learning approach for contextual dialogue act classification.
We use transfer learning to adapt models trained on human-human conversations to predict dialogue acts in human-machine dialogues.
arXiv Detail & Related papers (2020-05-28T06:48:10Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.