Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis
- URL: http://arxiv.org/abs/2004.11530v1
- Date: Fri, 24 Apr 2020 04:39:14 GMT
- Title: Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis
- Authors: Joyce Jiyoung Whang and Inderjit S. Dhillon
- Abstract summary: The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix.
We develop an efficient iterative algorithm which we call the NEO-CC algorithm.
Experimental results show that the NEO-CC algorithm is able to effectively capture the underlying co-clustering structure of real-world data.
- Score: 32.15852903039789
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of co-clustering is to simultaneously identify a clustering of rows
as well as columns of a two dimensional data matrix. A number of co-clustering
techniques have been proposed including information-theoretic co-clustering and
the minimum sum-squared residue co-clustering method. However, most existing
co-clustering algorithms are designed to find pairwise disjoint and exhaustive
co-clusters while many real-world datasets contain not only a large overlap
between co-clusters but also outliers which should not belong to any
co-cluster. In this paper, we formulate the problem of Non-Exhaustive,
Overlapping Co-Clustering where both of the row and column clusters are allowed
to overlap with each other and outliers for each dimension of the data matrix
are not assigned to any cluster. To solve this problem, we propose intuitive
objective functions, and develop an an efficient iterative algorithm which we
call the NEO-CC algorithm. We theoretically show that the NEO-CC algorithm
monotonically decreases the proposed objective functions. Experimental results
show that the NEO-CC algorithm is able to effectively capture the underlying
co-clustering structure of real-world data, and thus outperforms
state-of-the-art clustering and co-clustering methods. This manuscript includes
an extended analysis of [21].
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hybrid Fuzzy-Crisp Clustering Algorithm: Theory and Experiments [0.0]
We propose a hybrid fuzzy-crisp clustering algorithm based on a target function combining linear and quadratic terms of the membership function.
In this algorithm, the membership of a data point to a cluster is automatically set to exactly zero if the data point is sufficiently'' far from the cluster center.
The proposed algorithm is demonstrated to outperform the conventional methods on imbalanced datasets and can be competitive on more balanced datasets.
arXiv Detail & Related papers (2023-03-25T05:27:26Z) - Convex Clustering through MM: An Efficient Algorithm to Perform
Hierarchical Clustering [1.0589208420411012]
We propose convex clustering through majorization-minimization ( CCMM) -- an iterative algorithm that uses cluster fusions and a highly efficient updating scheme.
With a current desktop computer, CCMM efficiently solves convex clustering problems featuring over one million objects in seven-dimensional space.
arXiv Detail & Related papers (2022-11-03T15:07:51Z) - Adaptively-weighted Integral Space for Fast Multiview Clustering [54.177846260063966]
We propose an Adaptively-weighted Integral Space for Fast Multiview Clustering (AIMC) with nearly linear complexity.
Specifically, view generation models are designed to reconstruct the view observations from the latent integral space.
Experiments conducted on several realworld datasets confirm the superiority of the proposed AIMC method.
arXiv Detail & Related papers (2022-08-25T05:47:39Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Biclustering with Alternating K-Means [5.089110111757978]
We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk.
We propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.
The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
arXiv Detail & Related papers (2020-09-09T20:15:24Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z) - Point-Set Kernel Clustering [11.093960688450602]
This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects.
We show that the new clustering procedure is both effective and efficient that enables it to deal with large scale datasets.
arXiv Detail & Related papers (2020-02-14T00:00:03Z) - Bi-objective Optimization of Biclustering with Binary Data [0.0]
Clustering consists of partitioning data objects into subsets called clusters according to some similarity criteria.
This paper addresses a quasi-clustering that allows overlapping of clusters, and which we link to biclustering.
Biclustering simultaneously groups the objects and features so that a specific group of objects has a special group of features.
arXiv Detail & Related papers (2020-02-09T21:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.