Universal NP-Hardness of Clustering under General Utilities
- URL: http://arxiv.org/abs/2603.00210v1
- Date: Fri, 27 Feb 2026 13:08:15 GMT
- Title: Universal NP-Hardness of Clustering under General Utilities
- Authors: Angshul Majumdar,
- Abstract summary: We formalise the common optimisation core motivating a diverse-time computable partition utility over a finite metric space.<n>By mapping ten major paradigms -- including k-means, GMMs, DBSCAN, spectral clustering, and affinity propagation -- to the UCP framework, we demonstrate that each inherits this fundamental inability.
- Score: 11.62669179647184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is a central primitive in unsupervised learning, yet practice is dominated by heuristics whose outputs can be unstable and highly sensitive to representations, hyperparameters, and initialisation. Existing theoretical results are largely objective-specific and do not explain these behaviours at a unifying level. We formalise the common optimisation core underlying diverse clustering paradigms by defining the Universal Clustering Problem (UCP): the maximisation of a polynomial-time computable partition utility over a finite metric space. We prove the NP-hardness of UCP via two independent polynomial-time reductions from graph colouring and from exact cover by 3-sets (X3C). By mapping ten major paradigms -- including k-means, GMMs, DBSCAN, spectral clustering, and affinity propagation -- to the UCP framework, we demonstrate that each inherits this fundamental intractability. Our results provide a unified explanation for characteristic failure modes, such as local optima in alternating methods and greedy merge-order traps in hierarchical clustering. Finally, we show that clustering limitations reflect interacting computational and epistemic constraints, motivating a shift toward stability-aware objectives and interaction-driven formulations with explicit guarantees.
Related papers
- Deep Incomplete Multi-View Clustering via Hierarchical Imputation and Alignment [15.396375506151102]
We propose a novel deep IMVC framework that integrates hierarchical imputation and alignment with four key components.<n> Experiments on benchmarks demonstrate that our framework achieves superior performance under varying levels of missingness.
arXiv Detail & Related papers (2026-01-14T00:46:00Z) - Optimal Graph Clustering without Edge Density Signals [11.198418635553976]
This paper establishes the theoretical limits of graph clustering under the Popularity-Adjusted Block Model (PABM)<n>Our main contribution is the characterization of the optimal error rate for clustering under PABM.<n>Our numerical experiments on both synthetic and real datasets confirm that spectral clustering algorithms incorporating $k2$ eigenvectors outperform traditional spectral approaches.
arXiv Detail & Related papers (2025-10-24T17:24:26Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Memetic Differential Evolution Methods for Semi-Supervised Clustering [0.8681835475119588]
We propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust.
Our new framework, called S-MDEClust, represents the first memetic methodology designed to generate an optimal feasible solution.
arXiv Detail & Related papers (2024-03-07T08:37:36Z) - Anchor-based Multi-view Subspace Clustering with Hierarchical Feature Descent [46.86939432189035]
We propose Anchor-based Multi-view Subspace Clustering with Hierarchical Feature Descent.
Our proposed model consistently outperforms the state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-11T03:29:13Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.