Model Based Co-clustering of Mixed Numerical and Binary Data
- URL: http://arxiv.org/abs/2212.11725v1
- Date: Thu, 22 Dec 2022 14:16:08 GMT
- Title: Model Based Co-clustering of Mixed Numerical and Binary Data
- Authors: Aichetou Bouchareb (SAMM), Marc Boull\'e, Fabrice Cl\'erot, Fabrice
Rossi (CEREMADE)
- Abstract summary: Co-clustering is a data mining technique used to extract the underlying block structure between the rows and columns of a data matrix.
In this article, we extend the latent block models based co-clustering to the case of mixed data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Co-clustering is a data mining technique used to extract the underlying block
structure between the rows and columns of a data matrix. Many approaches have
been studied and have shown their capacity to extract such structures in
continuous, binary or contingency tables. However, very little work has been
done to perform co-clustering on mixed type data. In this article, we extend
the latent block models based co-clustering to the case of mixed data
(continuous and binary variables). We then evaluate the effectiveness of the
proposed approach on simulated data and we discuss its advantages and potential
limits.
Related papers
- HBIC: A Biclustering Algorithm for Heterogeneous Datasets [0.0]
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix.
We introduce a biclustering approach called HBIC, capable of discovering meaningful biclusters in complex heterogeneous data.
arXiv Detail & Related papers (2024-08-23T16:48:10Z) - Mixture of multilayer stochastic block models for multiview clustering [0.0]
We propose an original method for aggregating multiple clustering coming from different sources of information.
The identifiability of the model parameters is established and a variational Bayesian EM algorithm is proposed for the estimation of these parameters.
The method is utilized to analyze global food trading networks, leading to structures of interest.
arXiv Detail & Related papers (2024-01-09T17:15:47Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Mixed data Deep Gaussian Mixture Model: A clustering model for mixed
datasets [0.0]
We introduce a model-based clustering method called Mixed Deep Gaussian Mixture Model (MDGMM)
This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data.
Our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets.
arXiv Detail & Related papers (2020-10-13T19:52:46Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Biclustering with Alternating K-Means [5.089110111757978]
We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk.
We propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.
The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
arXiv Detail & Related papers (2020-09-09T20:15:24Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.