Goodness-of-fit Test on the Number of Biclusters in Relational Data
Matrix
- URL: http://arxiv.org/abs/2102.11658v1
- Date: Tue, 23 Feb 2021 12:25:58 GMT
- Title: Goodness-of-fit Test on the Number of Biclusters in Relational Data
Matrix
- Authors: Chihiro Watanabe, Taiji Suzuki
- Abstract summary: Biclustering is a problem to detect homogeneous submatrices in a given observed matrix.
We propose a new statistical test on the number of biclusters that does not require the regular-grid assumption.
- Score: 41.60125423028092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biclustering is a problem to detect homogeneous submatrices in a given
observed matrix, and it has been shown to be an effective tool for relational
data analysis. Although there have been many studies for estimating the
underlying bicluster structure of a matrix, few have enabled us to determine
the appropriate number of biclusters in an observed matrix. Recently, a
statistical test on the number of biclusters has been proposed for a
regular-grid bicluster structure, where we assume that the latent bicluster
structure can be represented by row-column clustering. However, when the latent
bicluster structure does not satisfy such regular-grid assumption, the previous
test requires too many biclusters (i.e., finer bicluster structure) for the
null hypothesis to be accepted, which is not desirable in terms of interpreting
the accepted bicluster structure. In this paper, we propose a new statistical
test on the number of biclusters that does not require the regular-grid
assumption, and derive the asymptotic behavior of the proposed test statistic
in both null and alternative cases. To develop the proposed test, we construct
a consistent submatrix localization algorithm, that is, the probability that it
outputs the correct bicluster structure converges to one. We show the
effectiveness of the proposed method by applying it to both synthetic and
practical relational data matrices.
Related papers
- HBIC: A Biclustering Algorithm for Heterogeneous Datasets [0.0]
Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix.
We introduce a biclustering approach called HBIC, capable of discovering meaningful biclusters in complex heterogeneous data.
arXiv Detail & Related papers (2024-08-23T16:48:10Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Biclustering with Alternating K-Means [5.089110111757978]
We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk.
We propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.
The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
arXiv Detail & Related papers (2020-09-09T20:15:24Z) - Selective Inference for Latent Block Models [50.83356836818667]
This study provides a selective inference method for latent block models.
We construct a statistical test on a set of row and column cluster memberships of a latent block model.
The proposed exact and approximated tests work effectively, compared to the naive test that did not take the selective bias into account.
arXiv Detail & Related papers (2020-05-27T10:44:19Z) - Bi-objective Optimization of Biclustering with Binary Data [0.0]
Clustering consists of partitioning data objects into subsets called clusters according to some similarity criteria.
This paper addresses a quasi-clustering that allows overlapping of clusters, and which we link to biclustering.
Biclustering simultaneously groups the objects and features so that a specific group of objects has a special group of features.
arXiv Detail & Related papers (2020-02-09T21:49:26Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.