Related papers: Enhancement of Short Text Clustering by Iterative Classification

Enhancement of Short Text Clustering by Iterative Classification

URL: http://arxiv.org/abs/2001.11631v1
Date: Fri, 31 Jan 2020 02:12:05 GMT
Title: Enhancement of Short Text Clustering by Iterative Classification
Authors: Md Rashadul Hasan Rakib, Norbert Zeh, Magdalena Jankowska, Evangelos Milios
Abstract summary: iterative classification applies outlier removal to obtain outlier-free clusters. It trains a classification algorithm using the non-outliers based on their cluster distributions. By repeating this several times, we obtain a much improved clustering of texts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a clustering of short texts obtained using an arbitrary clustering algorithm, iterative classification applies outlier removal to obtain outlier-free clusters. Then it trains a classification algorithm using the non-outliers based on their cluster distributions. Using the trained classification model, iterative classification reclassifies the outliers to obtain a new set of clusters. By repeating this several times, we obtain a much improved clustering of texts. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin.

Related papers

k-LLMmeans: Summaries as Centroids for Interpretable and Scalable LLM-Based Text Clustering [0.0]
We introduce k-LLMmeans, a novel modification of the k-means clustering algorithm that utilizes LLMs to generate textual summaries as cluster centroids. This modification preserves the properties of k-means while offering greater interpretability. We present a case study showcasing the interpretability of evolving cluster centroids in sequential text streams.
arXiv Detail & Related papers (2025-02-12T19:50:22Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering [16.08402937918212]
We propose a novel Classification-Enhanced Iterative Learning framework for short text clustering. In each iteration, we first adopt a language model to retrieve the initial text representations. After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information. Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration.
arXiv Detail & Related papers (2023-04-20T14:04:31Z)
A Proposition-Level Clustering Approach for Multi-Document Summarization [82.4616498914049]
We revisit the clustering approach, grouping together propositions for more precise information alignment. Our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.
arXiv Detail & Related papers (2021-12-16T10:34:22Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method. Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features. On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z)
CAC: A Clustering Based Framework for Classification [20.372627144885158]
We design a simple, efficient, and generic framework called Classification Aware Clustering (CAC) Our experiments on synthetic and real benchmark datasets demonstrate the efficacy of CAC over previous methods for combined clustering and classification.
arXiv Detail & Related papers (2021-02-23T18:59:39Z)
Predictive K-means with local models [0.028675177318965035]
Predictive clustering seeks to obtain the best of the two worlds. We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
arXiv Detail & Related papers (2020-12-16T10:49:36Z)
Neural Text Classification by Jointly Learning to Cluster and Align [5.969960391685054]
We extend the neural text clustering approach to text classification tasks by inducing cluster centers via a latent variable model and interacting with distributional word embeddings. The proposed method jointly learns word clustering centroids and clustering-token alignments, achieving the state of the art results on multiple benchmark datasets.
arXiv Detail & Related papers (2020-11-24T16:07:18Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information. We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation. Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.