Dialog Intent Induction via Density-based Deep Clustering Ensemble
- URL: http://arxiv.org/abs/2201.06731v1
- Date: Tue, 18 Jan 2022 04:13:26 GMT
- Title: Dialog Intent Induction via Density-based Deep Clustering Ensemble
- Authors: Jiashu Pu, Guandan Chen, Yongzhu Chang, Xiaoxi Mao
- Abstract summary: In real-life applications, it is crucial to occasionally induce novel dialog intents from the conversation logs to improve the user experience.
We propose the Density-based Deep Clustering Ensemble (DDCE) method for dialog intent induction.
Our proposed method is more effective in dealing with real-life scenarios where a large number of outliers exist.
- Score: 12.05997006407326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing task-oriented chatbots heavily rely on spoken language understanding
(SLU) systems to determine a user's utterance's intent and other key
information for fulfilling specific tasks. In real-life applications, it is
crucial to occasionally induce novel dialog intents from the conversation logs
to improve the user experience. In this paper, we propose the Density-based
Deep Clustering Ensemble (DDCE) method for dialog intent induction. Compared to
existing K-means based methods, our proposed method is more effective in
dealing with real-life scenarios where a large number of outliers exist. To
maximize data utilization, we jointly optimize texts' representations and the
hyperparameters of the clustering algorithm. In addition, we design an
outlier-aware clustering ensemble framework to handle the overfitting issue.
Experimental results over seven datasets show that our proposed method
significantly outperforms other state-of-the-art baselines.
Related papers
- Generative Context Distillation [48.91617280112579]
Generative Context Distillation (GCD) is a lightweight prompt internalization method that employs a joint training approach.
We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios.
arXiv Detail & Related papers (2024-11-24T17:32:20Z) - Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification [6.459396785817196]
Chain-of-Intent generates intent-driven conversations through self-play.
MINT-CL is a framework for multi-turn intent classification using multi-task contrastive learning.
We release MINT-E, a multilingual, intent-aware multi-turn e-commerce dialogue corpus.
arXiv Detail & Related papers (2024-11-21T15:59:29Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Free Lunch for Efficient Textual Commonsense Integration in Language
Models [20.02647320786556]
We group training samples with similar commonsense descriptions into a single batch, thus reusing the encoded description across multiple samples.
Extensive experiments illustrate that the proposed batch partitioning approach effectively reduces the computational cost while preserving performance.
The efficiency improvement is more pronounced on larger datasets and on devices with more memory capacity, attesting to its practical utility for large-scale applications.
arXiv Detail & Related papers (2023-05-24T19:14:57Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - Open-vocabulary Panoptic Segmentation with Embedding Modulation [71.15502078615587]
Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results.
We propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panopticon.
arXiv Detail & Related papers (2023-03-20T17:58:48Z) - Discovering Customer-Service Dialog System with Semi-Supervised Learning
and Coarse-to-Fine Intent Detection [6.869753194843482]
Task-oriented dialog aims to assist users in achieving specific goals through multi-turn conversation.
We constructed a weakly supervised dataset based on a teacher/student paradigm.
We also built a modular dialogue system and integrated coarse-to-fine grained classification for user intent detection.
arXiv Detail & Related papers (2022-12-23T14:36:43Z) - Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue [8.07809100513473]
This work investigates unsupervised approaches to overcome challenges in designing task-oriented dialog schema.
We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space.
Pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks.
arXiv Detail & Related papers (2022-12-05T04:37:22Z) - Semi-Supervised Clustering with Contrastive Learning for Discovering New
Intents [10.634249106899304]
We propose Deep Contrastive Semi-supervised Clustering (DCSC)
DCSC aims to cluster text samples in a semi-supervised way and provide grouped intents to operation staff.
We conduct experiments on two public datasets to compare our model with several popular methods.
arXiv Detail & Related papers (2022-01-07T09:58:43Z) - A Proposition-Level Clustering Approach for Multi-Document Summarization [82.4616498914049]
We revisit the clustering approach, grouping together propositions for more precise information alignment.
Our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions.
Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.
arXiv Detail & Related papers (2021-12-16T10:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.