Related papers: Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization

Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization

URL: http://arxiv.org/abs/2002.02846v1
Date: Fri, 7 Feb 2020 15:32:14 GMT
Title: Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization
Authors: Li Chen, Shuisheng Zhou, Jiajun Ma
Abstract summary: Kernel-based clustering algorithm can identify and capture the non-linear structure in datasets. It can achieve better performance than linear clustering. computing and storing the entire kernel matrix occupy so large memory that it is difficult for kernel-based clustering to deal with large-scale datasets.
Score: 11.631064399465089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Kernel-based clustering algorithm can identify and capture the non-linear structure in datasets, and thereby it can achieve better performance than linear clustering. However, computing and storing the entire kernel matrix occupy so large memory that it is difficult for kernel-based clustering to deal with large-scale datasets. In this paper, we employ incomplete Cholesky factorization to accelerate kernel clustering and save memory space. The key idea of the proposed kernel $k$-means clustering using incomplete Cholesky factorization is that we approximate the entire kernel matrix by the product of a low-rank matrix and its transposition. Then linear $k$-means clustering is applied to columns of the transpose of the low-rank matrix. We show both analytically and empirically that the performance of the proposed algorithm is similar to that of the kernel $k$-means clustering algorithm, but our method can deal with large-scale datasets.

Related papers

Radius-Guided Post-Clustering for Shape-Aware, Scalable Refinement of k-Means Results [1.9580473532948401]
After standard k-means, each cluster center is assigned a radius (the distance to its assigned point), and clusters whose radii overlap are merged. This post-processing step loosens the requirement for exact k long as k is. The method can often reconstruct non-estimated shapes over meaningful merges.
arXiv Detail & Related papers (2025-04-28T22:30:53Z)
Multiple kernel concept factorization algorithm based on global fusion [9.931283387968856]
In unsupervised environment, to design or select proper kernel function for specific dataset, a new algorithm called Globalized Multiple Kernel(GMKCF)was proposed. The proposed algorithm outperforms comparison algorithms in data clustering, such as Kernel K-Means(KKM), Spectral Clustering(SC), CF Kernel(KCF), Co-regularized multi-view spectral clustering(Coreg), and Robust Multiple KKM(RMKKM)
arXiv Detail & Related papers (2024-10-27T09:13:57Z)
Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z)
Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z)
Local Sample-weighted Multiple Kernel Clustering with Consensus Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. This paper proposes a novel local sample-weighted multiple kernel clustering model. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z)
k-Factorization Subspace Clustering [12.18340575383456]
Subspace clustering aims to cluster data lying in a union of low-dimensional subspaces. This paper presents a method called k-Factorization Subspace Clustering (k-FSC) for large-scale subspace clustering.
arXiv Detail & Related papers (2020-12-08T10:34:21Z)
Adaptive Explicit Kernel Minkowski Weighted K-means [1.3535770763481905]
The kernel K-means, which extends K-means into the kernel space, is able to capture nonlinear structures and identify arbitrarily shaped clusters. This paper proposes a method to combine the advantages of the linear and nonlinear approaches by using driven corresponding approximate finite-dimensional feature maps.
arXiv Detail & Related papers (2020-12-04T19:14:09Z)
Kernel k-Means, By All Means: Algorithms and Strong Consistency [21.013169939337583]
Kernel $k$ clustering is a powerful tool for unsupervised learning of non-linear data. In this paper, we generalize results leveraging a general family of means to combat sub-optimal local solutions. Our algorithm makes use of majorization-minimization (MM) to better solve this non-linear separation problem.
arXiv Detail & Related papers (2020-11-12T16:07:18Z)
Kernel learning approaches for summarising and combining posterior similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z)
Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data. This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix. Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z)
SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM) Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix. We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.