Delving into Spectral Clustering with Vision-Language Representations
- URL: http://arxiv.org/abs/2602.09586v1
- Date: Tue, 10 Feb 2026 09:36:24 GMT
- Title: Delving into Spectral Clustering with Vision-Language Representations
- Authors: Bo Peng, Yuanwei Hu, Bo Liu, Ling Chen, Jie Lu, Zhen Fang,
- Abstract summary: We propose Neural Tangent Kernel Spectral Clustering that leverages cross-modal alignment in pre-trained vision-language models.<n>We show that this formulation amplifies within-cluster connections while suppressing spurious ones across clusters.<n>Our method consistently outperforms the state-of-the-art by a large margin.
- Score: 27.433418706301477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral clustering is known as a powerful technique in unsupervised data analysis. The vast majority of approaches to spectral clustering are driven by a single modality, leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of spectral clustering from a single-modal to a multi-modal regime. Particularly, we propose Neural Tangent Kernel Spectral Clustering that leverages cross-modal alignment in pre-trained vision-language models. By anchoring the neural tangent kernel with positive nouns, i.e., those semantically close to the images of interest, we arrive at formulating the affinity between images as a coupling of their visual proximity and semantic overlap. We show that this formulation amplifies within-cluster connections while suppressing spurious ones across clusters, hence encouraging block-diagonal structures. In addition, we present a regularized affinity diffusion mechanism that adaptively ensembles affinity matrices induced by different prompts. Extensive experiments on \textbf{16} benchmarks -- including classical, large-scale, fine-grained and domain-shifted datasets -- manifest that our method consistently outperforms the state-of-the-art by a large margin.
Related papers
- Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering [18.462238432927915]
Unsupervised multi-view clustering aims to partition data into meaningful groups by leveraging complementary information from multiple views without labels.<n>Existing approaches often rely on magnitude-only affinities or early pseudo targets, which can be unstable when different views induce relations with comparable strengths but contradictory directional tendencies.<n>We propose emphPhase-Consistent Magnetic Spectral Learning for MVC: we explicitly model cross-view directional agreement as a phase term and combine it with a nonnegative magnitude backbone to form a complex-valued magnetic affinity.
arXiv Detail & Related papers (2026-02-21T06:04:57Z) - Wasserstein-Aligned Hyperbolic Multi-View Clustering [58.29261653100388]
This paper proposes a novel Wasserstein-Aligned Hyperbolic (WAH) framework for multi-view clustering.<n>Our method exploits a view-specific hyperbolic encoder for each view to embed features into the Lorentz manifold for hierarchical semantic modeling.
arXiv Detail & Related papers (2025-12-10T07:56:19Z) - Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds [49.95082206008502]
Alignment across Trees is a method that constructs and aligns tree-like hierarchical features for both image and text modalities.<n>We introduce a semantic-aware visual feature extraction framework that applies a cross-attention mechanism to visual class tokens from intermediate Transformer layers.
arXiv Detail & Related papers (2025-10-31T11:32:15Z) - Self-Enhanced Image Clustering with Cross-Modal Semantic Consistency [57.961869351897384]
We propose a framework based on cross-modal semantic consistency for efficient image clustering.<n>Our framework first builds a strong foundation via Cross-Modal Semantic Consistency.<n>In the first stage, we train lightweight clustering heads to align with the rich semantics of the pre-trained model.<n>In the second stage, we introduce a Self-Enhanced fine-tuning strategy.
arXiv Detail & Related papers (2025-08-02T08:12:57Z) - Generative Kernel Spectral Clustering [12.485601356990998]
We present Generative Kernel Spectral Clustering (GenKSC), a novel model combining kernel spectral clustering with generative modeling to produce both well-defined clusters and interpretable representations.<n>Results on MNIST and FashionMNIST datasets demonstrate the model's ability to learn meaningful cluster representations.
arXiv Detail & Related papers (2025-02-04T09:59:45Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Large-Scale Hyperspectral Image Clustering Using Contrastive Learning [18.473767002905433]
We present a scalable deep online clustering model, named Spectral-Spatial Contrastive Clustering (SSCC)
We exploit a symmetric twin neural network comprised of a projection head with a dimensionality of the cluster number to conduct dual contrastive learning from a spectral-spatial augmentation pool.
The resulting approach is trained in an end-to-end fashion by batch-wise optimization, making it robust in large-scale data and resulting in good generalization ability for unseen data.
arXiv Detail & Related papers (2021-11-15T17:50:06Z) - Doubly Stochastic Subspace Clustering [9.815735805354905]
Many state-of-the-art subspace clustering methods follow a two-step process by first constructing an affinity matrix between data points and then applying spectral clustering to this affinity.
In this work, we learn both a self-expressive representation of the data and an affinity matrix that is well-normalized for spectral clustering.
Experiments show that our method achieves state-of-the-art subspace clustering performance on many common datasets in computer vision.
arXiv Detail & Related papers (2020-11-30T14:56:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.