Predictive K-means with local models
- URL: http://arxiv.org/abs/2012.09630v1
- Date: Wed, 16 Dec 2020 10:49:36 GMT
- Title: Predictive K-means with local models
- Authors: Vincent Lemaire, Oumaima Alaoui Ismaili, Antoine Cornu\'ejols,
Dominique Gay
- Abstract summary: Predictive clustering seeks to obtain the best of the two worlds.
We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
- Score: 0.028675177318965035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised classification can be effective for prediction but sometimes weak
on interpretability or explainability (XAI). Clustering, on the other hand,
tends to isolate categories or profiles that can be meaningful but there is no
guarantee that they are useful for labels prediction. Predictive clustering
seeks to obtain the best of the two worlds. Starting from labeled data, it
looks for clusters that are as pure as possible with regards to the class
labels. One technique consists in tweaking a clustering algorithm so that data
points sharing the same label tend to aggregate together. With distance-based
algorithms, such as k-means, a solution is to modify the distance used by the
algorithm so that it incorporates information about the labels of the data
points. In this paper, we propose another method which relies on a change of
representation guided by class densities and then carries out clustering in
this new representation space. We present two new algorithms using this
technique and show on a variety of data sets that they are competitive for
prediction performance with pure supervised classifiers while offering
interpretability of the clusters discovered.
Related papers
- Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - A new approach for evaluating internal cluster validation indices [0.0]
Cluster validation is needed to select the best-performing algorithm.
Several indices were proposed for this purpose without using any additional (external) information.
Evaluation approaches differ in how they use the information on the ground-truth classification.
arXiv Detail & Related papers (2023-08-02T06:55:33Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - ck-means, a novel unsupervised learning method that combines fuzzy and
crispy clustering methods to extract intersecting data [1.827510863075184]
This paper proposes a method to cluster data that share the same intersections between two features or more.
The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm.
The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI)
arXiv Detail & Related papers (2022-06-17T19:29:50Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect
the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters.
Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z) - Forest Fire Clustering: Cluster-oriented Label Propagation Clustering
and Monte Carlo Verification Inspired by Forest Fire Dynamics [4.645676097881571]
We introduce a novel method that could not only find robust clusters but also provide a confidence score for the labels of each data point.
Specifically, we reformulated label-propagation clustering to model after forest fire dynamics.
arXiv Detail & Related papers (2021-03-22T13:02:37Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z) - Enhancement of Short Text Clustering by Iterative Classification [0.0]
iterative classification applies outlier removal to obtain outlier-free clusters.
It trains a classification algorithm using the non-outliers based on their cluster distributions.
By repeating this several times, we obtain a much improved clustering of texts.
arXiv Detail & Related papers (2020-01-31T02:12:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.