Conjoined Dirichlet Process
- URL: http://arxiv.org/abs/2002.03223v1
- Date: Sat, 8 Feb 2020 19:41:23 GMT
- Title: Conjoined Dirichlet Process
- Authors: Michelle N. Ngo, Dustin S. Pluta, Alexander N. Ngo, Babak Shahbaba
- Abstract summary: We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
- Score: 63.89763375457853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biclustering is a class of techniques that simultaneously clusters the rows
and columns of a matrix to sort heterogeneous data into homogeneous blocks.
Although many algorithms have been proposed to find biclusters, existing
methods suffer from the pre-specification of the number of biclusters or place
constraints on the model structure. To address these issues, we develop a
novel, non-parametric probabilistic biclustering method based on Dirichlet
processes to identify biclusters with strong co-occurrence in both rows and
columns. The proposed method utilizes dual Dirichlet process mixture models to
learn row and column clusters, with the number of resulting clusters determined
by the data rather than pre-specified. Probabilistic biclusters are identified
by modeling the mutual dependence between the row and column clusters. We apply
our method to two different applications, text mining and gene expression
analysis, and demonstrate that our method improves bicluster extraction in many
settings compared to existing approaches.
Related papers
- HBIC: A Biclustering Algorithm for Heterogeneous Datasets [0.0]
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix.
We introduce a biclustering approach called HBIC, capable of discovering meaningful biclusters in complex heterogeneous data.
arXiv Detail & Related papers (2024-08-23T16:48:10Z) - Goodness-of-fit Test on the Number of Biclusters in Relational Data
Matrix [41.60125423028092]
Biclustering is a problem to detect homogeneous submatrices in a given observed matrix.
We propose a new statistical test on the number of biclusters that does not require the regular-grid assumption.
arXiv Detail & Related papers (2021-02-23T12:25:58Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Biclustering with Alternating K-Means [5.089110111757978]
We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk.
We propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.
The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
arXiv Detail & Related papers (2020-09-09T20:15:24Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z) - A Novel Granular-Based Bi-Clustering Method of Deep Mining the
Co-Expressed Genes [76.84066556597342]
Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions.
Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters.
We propose a novel bi-clustering method by involving here the theory of Granular Computing.
arXiv Detail & Related papers (2020-05-12T02:04:40Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.