Accelerated sparse Kernel Spectral Clustering for large scale data
clustering problems
- URL: http://arxiv.org/abs/2310.13381v1
- Date: Fri, 20 Oct 2023 09:51:42 GMT
- Title: Accelerated sparse Kernel Spectral Clustering for large scale data
clustering problems
- Authors: Mihaly Novak, Rocco Langone, Carlos Alzate, Johan Suykens
- Abstract summary: An improved version of the sparse multiway kernel spectral clustering (KSC) is presented in this brief.
The original algorithm is derived from weighted kernel principal component analysis formulated within the primal-dual least-squares support vector machine (LS-SVM) framework.
Sparsity is achieved then by the combination of the incomplete Cholesky decomposition (ICD) based low rank approximation of the kernel matrix with the so called reduced set method.
- Score: 0.27257174044950283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An improved version of the sparse multiway kernel spectral clustering (KSC)
is presented in this brief. The original algorithm is derived from weighted
kernel principal component (KPCA) analysis formulated within the primal-dual
least-squares support vector machine (LS-SVM) framework. Sparsity is achieved
then by the combination of the incomplete Cholesky decomposition (ICD) based
low rank approximation of the kernel matrix with the so called reduced set
method. The original ICD based sparse KSC algorithm was reported to be
computationally far too demanding, especially when applied on large scale data
clustering problems that actually it was designed for, which has prevented to
gain more than simply theoretical relevance so far. This is altered by the
modifications reported in this brief that drastically improve the computational
characteristics. Solving the alternative, symmetrized version of the
computationally most demanding core eigenvalue problem eliminates the necessity
of forming and SVD of large matrices during the model construction. This
results in solving clustering problems now within seconds that were reported to
require hours without altering the results. Furthermore, sparsity is also
improved significantly, leading to more compact model representation,
increasing further not only the computational efficiency but also the
descriptive power. These transform the original, only theoretically relevant
ICD based sparse KSC algorithm applicable for large scale practical clustering
problems. Theoretical results and improvements are demonstrated by
computational experiments on carefully selected synthetic data as well as on
real life problems such as image segmentation.
Related papers
- A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization [83.12938977698988]
Generalized Category Discovery (GCD) aims to classify both base and novel images using labeled base data.
Current approaches inadequately address the intrinsic optimization of the co-occurrence matrix $barA$ based on cosine similarity.
We propose a Non-Negative Generalized Category Discovery (NN-GCD) framework to address these deficiencies.
arXiv Detail & Related papers (2024-10-29T07:24:11Z) - Multiple kernel concept factorization algorithm based on global fusion [9.931283387968856]
In unsupervised environment, to design or select proper kernel function for specific dataset, a new algorithm called Globalized Multiple Kernel(GMKCF)was proposed.
The proposed algorithm outperforms comparison algorithms in data clustering, such as Kernel K-Means(KKM), Spectral Clustering(SC), CF Kernel(KCF), Co-regularized multi-view spectral clustering(Coreg), and Robust Multiple KKM(RMKKM)
arXiv Detail & Related papers (2024-10-27T09:13:57Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Regularization and Optimization in Model-Based Clustering [4.096453902709292]
k-means algorithm variants essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution.
We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting.
These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration.
arXiv Detail & Related papers (2023-02-05T18:22:29Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - k-MS: A novel clustering algorithm based on morphological reconstruction [0.0]
k-MS is faster than the CPU-parallel k-Means in worst case scenarios.
It is also faster than similar clusterization methods that are sensitive to density and shapes such as Mitosis and TRICLUST.
arXiv Detail & Related papers (2022-08-30T16:55:21Z) - Fast and Provable Tensor Robust Principal Component Analysis via Scaled
Gradient Descent [30.299284742925852]
This paper tackles tensor robust principal component analysis (RPCA)
It aims to recover a low-rank tensor from its observations contaminated by sparse corruptions.
We show that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms.
arXiv Detail & Related papers (2022-06-18T04:01:32Z) - SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM)
Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix.
We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z) - Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization [11.631064399465089]
Kernel-based clustering algorithm can identify and capture the non-linear structure in datasets.
It can achieve better performance than linear clustering.
computing and storing the entire kernel matrix occupy so large memory that it is difficult for kernel-based clustering to deal with large-scale datasets.
arXiv Detail & Related papers (2020-02-07T15:32:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.