CAC: A Clustering Based Framework for Classification
- URL: http://arxiv.org/abs/2102.11872v1
- Date: Tue, 23 Feb 2021 18:59:39 GMT
- Title: CAC: A Clustering Based Framework for Classification
- Authors: Shivin Srivastava, Siddharth Bhatia, Lingxiao Huang, Lim Jun Heng,
Kenji Kawaguchi, Vaibhav Rajan
- Abstract summary: We design a simple, efficient, and generic framework called Classification Aware Clustering (CAC)
Our experiments on synthetic and real benchmark datasets demonstrate the efficacy of CAC over previous methods for combined clustering and classification.
- Score: 20.372627144885158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In data containing heterogeneous subpopulations, classification performance
benefits from incorporating the knowledge of cluster structure in the
classifier. Previous methods for such combined clustering and classification
either are classifier-specific and not generic or independently perform
clustering and classifier training, which may not form clusters that can
potentially benefit classifier performance. The question of how to perform
clustering to improve the performance of classifiers trained on the clusters
has received scant attention in previous literature despite its importance in
several real-world applications. In this paper, we theoretically analyze when
and how clustering may help in obtaining accurate classifiers. We design a
simple, efficient, and generic framework called Classification Aware Clustering
(CAC), to find clusters that are well suited for being used as training
datasets by classifiers for each underlying subpopulation. Our experiments on
synthetic and real benchmark datasets demonstrate the efficacy of CAC over
previous methods for combined clustering and classification.
Related papers
- Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - CEIL: A General Classification-Enhanced Iterative Learning Framework for
Text Clustering [16.08402937918212]
We propose a novel Classification-Enhanced Iterative Learning framework for short text clustering.
In each iteration, we first adopt a language model to retrieve the initial text representations.
After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information.
Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration.
arXiv Detail & Related papers (2023-04-20T14:04:31Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Self-Evolutionary Clustering [1.662966122370634]
Most existing deep clustering methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping.
A novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner.
The framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised.
arXiv Detail & Related papers (2022-02-21T19:38:18Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z) - Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy
Classifiers [25.32478253796209]
Fuzzy c-means based clustering algorithms are frequently used for Takagi-Sugeno-Kang (TSK) fuzzy classifier parameter estimation.
This paper proposes a supervised enhanced soft subspace clustering (SESSC) algorithm, which considers simultaneously the within-cluster compactness, between-cluster separation, and label information in clustering.
arXiv Detail & Related papers (2020-02-27T19:39:19Z) - Enhancement of Short Text Clustering by Iterative Classification [0.0]
iterative classification applies outlier removal to obtain outlier-free clusters.
It trains a classification algorithm using the non-outliers based on their cluster distributions.
By repeating this several times, we obtain a much improved clustering of texts.
arXiv Detail & Related papers (2020-01-31T02:12:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.