A Clustering Framework for Unsupervised and Semi-supervised New Intent
Discovery
- URL: http://arxiv.org/abs/2304.07699v3
- Date: Wed, 13 Dec 2023 01:39:20 GMT
- Title: A Clustering Framework for Unsupervised and Semi-supervised New Intent
Discovery
- Authors: Hanlei Zhang, Hua Xu, Xin Wang, Fei Long, Kai Gao
- Abstract summary: We propose a novel clustering framework, USNID, for unsupervised and semi-supervised new intent discovery.
First, it fully utilizes unsupervised or semi-supervised data to mine shallow semantic similarity relations.
Second, it designs a centroid-guided clustering mechanism to address the issue of cluster allocation inconsistency.
Third, it captures high-level semantics in unsupervised or semi-supervised data to discover fine-grained intent-wise clusters.
- Score: 25.900661912504397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New intent discovery is of great value to natural language processing,
allowing for a better understanding of user needs and providing friendly
services. However, most existing methods struggle to capture the complicated
semantics of discrete text representations when limited or no prior knowledge
of labeled data is available. To tackle this problem, we propose a novel
clustering framework, USNID, for unsupervised and semi-supervised new intent
discovery, which has three key technologies. First, it fully utilizes
unsupervised or semi-supervised data to mine shallow semantic similarity
relations and provide well-initialized representations for clustering. Second,
it designs a centroid-guided clustering mechanism to address the issue of
cluster allocation inconsistency and provide high-quality self-supervised
targets for representation learning. Third, it captures high-level semantics in
unsupervised or semi-supervised data to discover fine-grained intent-wise
clusters by optimizing both cluster-level and instance-level objectives. We
also propose an effective method for estimating the cluster number in
open-world scenarios without knowing the number of new intents beforehand.
USNID performs exceptionally well on several benchmark intent datasets,
achieving new state-of-the-art results in unsupervised and semi-supervised new
intent discovery and demonstrating robust performance with different cluster
numbers.
Related papers
- A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z) - Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Learning the Precise Feature for Cluster Assignment [39.320210567860485]
We propose a framework which integrates representation learning and clustering into a single pipeline for the first time.
The proposed framework exploits the powerful ability of recently developed generative models for learning intrinsic features.
Experimental results show that the performance of the proposed method is superior, or at least comparable to, the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-11T04:08:54Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - CycleCluster: Modernising Clustering Regularisation for Deep
Semi-Supervised Classification [0.0]
We propose a novel framework, CycleCluster, for deep semi-supervised classification.
Our core optimisation is driven by a new clustering based regularisation along with a graph based pseudo-labels and a shared deep network.
arXiv Detail & Related papers (2020-01-15T13:34:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.