Related papers: Predictive K-means with local models

Predictive K-means with local models

URL: http://arxiv.org/abs/2012.09630v1
Date: Wed, 16 Dec 2020 10:49:36 GMT
Title: Predictive K-means with local models
Authors: Vincent Lemaire, Oumaima Alaoui Ismaili, Antoine Cornu\'ejols, Dominique Gay
Abstract summary: Predictive clustering seeks to obtain the best of the two worlds. We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
Score: 0.028675177318965035
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds. Starting from labeled data, it looks for clusters that are as pure as possible with regards to the class labels. One technique consists in tweaking a clustering algorithm so that data points sharing the same label tend to aggregate together. With distance-based algorithms, such as k-means, a solution is to modify the distance used by the algorithm so that it incorporates information about the labels of the data points. In this paper, we propose another method which relies on a change of representation guided by class densities and then carries out clustering in this new representation space. We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance with pure supervised classifiers while offering interpretability of the clusters discovered.

Related papers

Order is All You Need for Categorical Data Clustering [31.851890008893847]
This paper introduces a new finding that the order relation among attribute values is the decisive factor in clustering accuracy. We propose a new learning paradigm that allows joint learning of clusters and the orders. The algorithm achieves superior clustering accuracy with a convergence guarantee.
arXiv Detail & Related papers (2024-11-19T08:23:25Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
A new approach for evaluating internal cluster validation indices [0.0]
Cluster validation is needed to select the best-performing algorithm. Several indices were proposed for this purpose without using any additional (external) information. Evaluation approaches differ in how they use the information on the ground-truth classification.
arXiv Detail & Related papers (2023-08-02T06:55:33Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
ck-means, a novel unsupervised learning method that combines fuzzy and crispy clustering methods to extract intersecting data [1.827510863075184]
This paper proposes a method to cluster data that share the same intersections between two features or more. The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm. The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI)
arXiv Detail & Related papers (2022-06-17T19:29:50Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters [0.12313056815753944]
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. Accuracy and speed are two main advantages of the proposed method.
arXiv Detail & Related papers (2021-10-09T23:02:57Z)
Forest Fire Clustering: Cluster-oriented Label Propagation Clustering and Monte Carlo Verification Inspired by Forest Fire Dynamics [4.645676097881571]
We introduce a novel method that could not only find robust clusters but also provide a confidence score for the labels of each data point. Specifically, we reformulated label-propagation clustering to model after forest fire dynamics.
arXiv Detail & Related papers (2021-03-22T13:02:37Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
Enhancement of Short Text Clustering by Iterative Classification [0.0]
iterative classification applies outlier removal to obtain outlier-free clusters. It trains a classification algorithm using the non-outliers based on their cluster distributions. By repeating this several times, we obtain a much improved clustering of texts.
arXiv Detail & Related papers (2020-01-31T02:12:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.