Open Intent Discovery through Unsupervised Semantic Clustering and
Dependency Parsing
- URL: http://arxiv.org/abs/2104.12114v1
- Date: Sun, 25 Apr 2021 09:36:23 GMT
- Title: Open Intent Discovery through Unsupervised Semantic Clustering and
Dependency Parsing
- Authors: Pengfei Liu, Youzhang Ning, King Keung Wu, Kun Li and Helen Meng
- Abstract summary: This paper proposes an unsupervised two-stage approach to discover intents and generate intent labels automatically from a collection of unlabeled utterances.
We empirically show that the proposed unsupervised approach can generate meaningful intent labels automatically and achieves high precision and recall in utterance clustering and intent discovery.
- Score: 44.99113692679489
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Intent understanding plays an important role in dialog systems, and is
typically formulated as a supervised classification problem. However, it is
challenging and time-consuming to design the intent labels manually to support
a new domain. This paper proposes an unsupervised two-stage approach to
discover intents and generate meaningful intent labels automatically from a
collection of unlabeled utterances. In the first stage, we aim to generate a
set of semantically coherent clusters where the utterances within each cluster
convey the same intent. We obtain the utterance representation from various
pre-trained sentence embeddings and present a metric of balanced score to
determine the optimal number of clusters in K-means clustering. In the second
stage, the objective is to generate an intent label automatically for each
cluster. We extract the ACTION-OBJECT pair from each utterance using a
dependency parser and take the most frequent pair within each cluster, e.g.,
book-restaurant, as the generated cluster label. We empirically show that the
proposed unsupervised approach can generate meaningful intent labels
automatically and achieves high precision and recall in utterance clustering
and intent discovery.
Related papers
- Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Actively Supervised Clustering for Open Relation Extraction [42.114747195195655]
We present a novel setting, named actively supervised clustering for OpenRE.
The key to the setting is selecting which instances to label.
We propose a new strategy, which is applicable to dynamically discover clusters of unknown relations.
arXiv Detail & Related papers (2023-06-08T06:55:02Z) - IDAS: Intent Discovery with Abstractive Summarization [16.731183915325584]
We show that recent competitive methods in intent discovery can be outperformed by clustering utterances based on abstractive summaries.
We contribute the IDAS approach, which collects a set of descriptive utterance labels by prompting a Large Language Model.
The utterances and their resulting noisy labels are then encoded by a frozen pre-trained encoder, and subsequently clustered to recover the latent intents.
arXiv Detail & Related papers (2023-05-31T12:19:40Z) - Goal-Driven Explainable Clustering via Language Descriptions [50.980832345025334]
We propose a new task formulation, "Goal-Driven Clustering with Explanations" (GoalEx)
GoalEx represents both the goal and the explanations as free-form language descriptions.
Our method produces more accurate and goal-related explanations than prior methods.
arXiv Detail & Related papers (2023-05-23T07:05:50Z) - A Clustering Framework for Unsupervised and Semi-supervised New Intent
Discovery [25.900661912504397]
We propose a novel clustering framework, USNID, for unsupervised and semi-supervised new intent discovery.
First, it fully utilizes unsupervised or semi-supervised data to mine shallow semantic similarity relations.
Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency.
Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters.
arXiv Detail & Related papers (2023-04-16T05:30:42Z) - Analysis of Utterance Embeddings and Clustering Methods Related to Intent Induction for Task-Oriented Dialogue [8.07809100513473]
This work investigates unsupervised approaches to overcome challenges in designing task-oriented dialog schema.
We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space.
Pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks.
arXiv Detail & Related papers (2022-12-05T04:37:22Z) - Out-of-Category Document Identification Using Target-Category Names as
Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories.
We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Intent Mining from past conversations for conversational agent [1.9754522186574608]
Bots are increasingly being deployed to provide round-the-clock support and to increase customer engagement.
Many of the commercial bot building frameworks follow a standard approach that requires one to build and train an intent model to recognize a user input.
We have introduced a novel density-based clustering algorithm ITERDB-LabelSCAN for unbalanced data clustering.
arXiv Detail & Related papers (2020-05-22T05:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.