Topic Detection from Conversational Dialogue Corpus with Parallel
Dirichlet Allocation Model and Elbow Method
- URL: http://arxiv.org/abs/2006.03353v1
- Date: Fri, 5 Jun 2020 10:24:43 GMT
- Title: Topic Detection from Conversational Dialogue Corpus with Parallel
Dirichlet Allocation Model and Elbow Method
- Authors: Haider Khalid, Vincent Wade
- Abstract summary: We propose a topic detection approach with Parallel Latent Dirichlet Allocation (PLDA) Model.
We use K-mean clustering with Elbow Method for interpretation and validation of consistency within-cluster analysis.
The experimental results show that combining PLDA with Elbow method selects the optimal number of clusters and refines the topics for the conversation.
- Score: 1.599072005190786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A conversational system needs to know how to switch between topics to
continue the conversation for a more extended period. For this topic detection
from dialogue corpus has become an important task for a conversation and
accurate prediction of conversation topics is important for creating coherent
and engaging dialogue systems. In this paper, we proposed a topic detection
approach with Parallel Latent Dirichlet Allocation (PLDA) Model by clustering a
vocabulary of known similar words based on TF-IDF scores and Bag of Words (BOW)
technique. In the experiment, we use K-mean clustering with Elbow Method for
interpretation and validation of consistency within-cluster analysis to select
the optimal number of clusters. We evaluate our approach by comparing it with
traditional LDA and clustering technique. The experimental results show that
combining PLDA with Elbow method selects the optimal number of clusters and
refine the topics for the conversation.
Related papers
- Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - Multi-Granularity Prompts for Topic Shift Detection in Dialogue [13.739991183173494]
The goal of dialogue topic shift detection is to identify whether the current topic in a conversation has changed or needs to change.
Previous work focused on detecting topic shifts using pre-trained models to encode the utterance.
We take a prompt-based approach to fully extract topic information from dialogues at multiple-granularity, i.e., label, turn, and topic.
arXiv Detail & Related papers (2023-05-23T12:35:49Z) - Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance
Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks.
We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z) - Improve Retrieval-based Dialogue System via Syntax-Informed Attention [46.79601705850277]
We propose SIA, Syntax-Informed Attention, considering both intra- and inter-sentence syntax information.
We evaluate our method on three widely used benchmarks and experimental results demonstrate the general superiority of our method on dialogue response selection.
arXiv Detail & Related papers (2023-03-12T08:14:16Z) - CluCDD:Contrastive Dialogue Disentanglement via Clustering [18.06976502939079]
A huge number of multi-participant dialogues happen online every day.
Dialogue disentanglement aims at separating an entangled dialogue into detached sessions.
We propose a model named CluCDD, which aggregates utterances by contrastive learning.
arXiv Detail & Related papers (2023-02-16T08:47:51Z) - Findings on Conversation Disentanglement [28.874162427052905]
We build a learning model that learns utterance-to-utterance and utterance-to-thread classification.
Experiments on the Ubuntu IRC dataset show that this approach has the potential to outperform the conventional greedy approach.
arXiv Detail & Related papers (2021-12-10T05:54:48Z) - Response Selection for Multi-Party Conversations with Dynamic Topic
Tracking [63.15158355071206]
We frame response selection as a dynamic topic tracking task to match the topic between the response and relevant conversation context.
We propose a novel multi-task learning framework that supports efficient encoding through large pretrained models.
Experimental results on the DSTC-8 Ubuntu IRC dataset show state-of-the-art results in response selection and topic disentanglement tasks.
arXiv Detail & Related papers (2020-10-15T14:21:38Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z) - Modeling Topical Relevance for Multi-Turn Dialogue Generation [61.87165077442267]
We propose a new model, named STAR-BTM, to tackle the problem of topic drift in multi-turn dialogue.
The Biterm Topic Model is pre-trained on the whole training dataset. Then, the topic level attention weights are computed based on the topic representation of each context.
Experimental results on both Chinese customer services data and English Ubuntu dialogue data show that STAR-BTM significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-09-27T03:33:22Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - A Hybrid Framework for Topic Structure using Laughter Occurrences [0.3680403821470856]
In this work we combine both paralinguistic and linguistic knowledge into a hybrid framework through a multi-level hierarchy.
The laughter occurrences are used as paralinguistic information from the multiparty meeting transcripts of ICSI database.
This training-free topic structuring approach can be applicable to online understanding of spoken dialogs.
arXiv Detail & Related papers (2019-12-31T23:31:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.