Related papers: UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning

UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning

URL: http://arxiv.org/abs/2107.01988v1
Date: Mon, 5 Jul 2021 12:55:13 GMT
Title: UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning
Authors: Robin Louiset and Pietro Gori and Benoit Dufumier and Josselin Houenou and Antoine Grigis and Edouard Duchesnay
Abstract summary: Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task. We propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning) Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression.
Score: 2.133032470368051
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Subtype Discovery consists in finding interpretable and consistent sub-parts of a dataset, which are also relevant to a certain supervised task. From a mathematical point of view, this can be defined as a clustering task driven by supervised learning in order to uncover subgroups in line with the supervised prediction. In this paper, we propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning). Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression. We propose to construct a non-linear model by merging multiple linear estimators, one per cluster. Each hyperplane is estimated so that it correctly discriminates - or predict - only one cluster. We use SVC or Logistic Regression for classification and SVR for regression. Furthermore, to perform cluster analysis within a more suitable space, we also propose a dimension-reduction algorithm that projects the data onto an orthonormal space relevant to the supervised task. We analyze the robustness and generalization capability of our algorithm using synthetic and experimental datasets. In particular, we validate its ability to identify suitable consistent sub-types by conducting a psychiatric-diseases cluster analysis with known ground-truth labels. The gain of the proposed method over previous state-of-the-art techniques is about +1.9 points in terms of balanced accuracy. Finally, we make codes and examples available in a scikit-learn-compatible Python package at https://github.com/neurospin-projects/2021_rlouiset_ucsl

Related papers

Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z)
Bi-Level Unsupervised Feature Selection [11.383408944117804]
We propose a novel bi-level unsupervised feature selection (BLUFS) method, including a clustering level and a feature level.<n>At the clustering level, spectral clustering is used to generate pseudo-labels for representing the data structure, while a continuous linear regression model is developed to learn the projection matrix.<n>At the feature level, the $ell_2,0$-norm constraint is imposed on the projection matrix for more effectively selecting features.
arXiv Detail & Related papers (2025-05-26T22:52:31Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Can an unsupervised clustering algorithm reproduce a categorization system? [1.0485739694839669]
We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset. We show that success depends on feature selection and the chosen distance metric.
arXiv Detail & Related papers (2024-08-19T18:27:14Z)
Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification [49.09505771145326]
We propose a Hierarchical Dynamic Labeling (HDL) algorithm that does not depend on model predictions and utilizes image embeddings to generate sample labels. Our approach has the potential to change the paradigm of pseudo-label generation in semi-supervised learning.
arXiv Detail & Related papers (2024-04-26T06:00:27Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Interpretable Deep Clustering for Tabular Data [7.972599673048582]
Clustering is a fundamental learning task widely used in data analysis. We propose a new deep-learning framework that predicts interpretable cluster assignments at the instance and cluster levels. We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics datasets.
arXiv Detail & Related papers (2023-06-07T21:08:09Z)
A Generalized Framework for Predictive Clustering and Optimization [18.06697544912383]
Clustering is a powerful and extensively used data science tool. In this article, we define a generalized optimization framework for predictive clustering. We also present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization.
arXiv Detail & Related papers (2023-05-07T19:56:51Z)
Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels. When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new. In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Learning Self-Expression Metrics for Scalable and Inductive Subspace Clustering [5.587290026368626]
Subspace clustering has established itself as a state-of-the-art approach to clustering high-dimensional data. We propose a novel metric learning approach to learn instead a subspace affinity function using a siamese neural network architecture. Our model benefits from a constant number of parameters and a constant-size memory footprint, allowing it to scale to considerably larger datasets.
arXiv Detail & Related papers (2020-09-27T15:40:12Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
Robust Self-Supervised Convolutional Neural Network for Subspace Clustering and Classification [0.10152838128195464]
This paper proposes the robust formulation of the self-supervised convolutional subspace clustering network ($S2$ConvSCN) In a truly unsupervised training environment, Robust $S2$ConvSCN outperforms its baseline version by a significant amount for both seen and unseen data on four well-known datasets.
arXiv Detail & Related papers (2020-04-03T16:07:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.