ClusterFormer: Clustering As A Universal Visual Learner
- URL: http://arxiv.org/abs/2309.13196v3
- Date: Fri, 6 Oct 2023 00:38:16 GMT
- Title: ClusterFormer: Clustering As A Universal Visual Learner
- Authors: James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang,
Dongfang Liu
- Abstract summary: CLUSTERFORMER is a universal vision model based on the CLUSTERing paradigm with TransFORMER.
It is capable of tackling heterogeneous vision tasks with varying levels of clustering granularity.
For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
- Score: 80.79669078819562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents CLUSTERFORMER, a universal vision model that is based on
the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1.
recurrent cross-attention clustering, which reformulates the cross-attention
mechanism in Transformer and enables recursive updates of cluster centers to
facilitate strong representation learning; and 2. feature dispatching, which
uses the updated cluster centers to redistribute image features through
similarity-based metrics, resulting in a transparent pipeline. This elegant
design streamlines an explainable and transferable workflow, capable of
tackling heterogeneous vision tasks (i.e., image classification, object
detection, and image segmentation) with varying levels of clustering
granularity (i.e., image-, box-, and pixel-level). Empirical results
demonstrate that CLUSTERFORMER outperforms various well-known specialized
architectures, achieving 83.41% top-1 acc. over ImageNet-1K for image
classification, 54.2% and 47.0% mAP over MS COCO for object detection and
instance segmentation, 52.4% mIoU over ADE20K for semantic segmentation, and
55.8% PQ over COCO Panoptic for panoptic segmentation. For its efficacy, we
hope our work can catalyze a paradigm shift in universal models in computer
vision.
Related papers
- Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Superpixel Graph Contrastive Clustering with Semantic-Invariant
Augmentations for Hyperspectral Images [64.72242126879503]
Hyperspectral images (HSI) clustering is an important but challenging task.
We first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spatial and spectral features of HSI.
We then design a superpixel graph contrastive clustering model to learn discriminative superpixel representations.
arXiv Detail & Related papers (2024-03-04T07:40:55Z) - Rethinking cluster-conditioned diffusion models [1.597617022056624]
We elucidate how individual components regarding image clustering impact image synthesis across three datasets.
We show that, given the optimal cluster granularity with respect to image synthesis (visual groups), cluster-conditioning can achieve state-of-the-art FID.
We propose a novel method to derive an upper cluster bound that reduces the search space of the visual groups using solely feature-based clustering.
arXiv Detail & Related papers (2024-03-01T14:47:46Z) - Grid Jigsaw Representation with CLIP: A New Perspective on Image
Clustering [37.15595383168132]
Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer.
GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets.
Experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
arXiv Detail & Related papers (2023-10-27T03:07:05Z) - CVFC: Attention-Based Cross-View Feature Consistency for Weakly
Supervised Semantic Segmentation of Pathology Images [3.2128744424771725]
Histopathology image segmentation is the gold standard for diagnosing cancer.
Many studies now use imagelevel labels to achieve pixel-level segmentation to reduce the need for fine-grained annotation.
We propose an attention-based cross-view feature consistency end-to-end pseudo-mask generation framework named CVFC.
arXiv Detail & Related papers (2023-08-21T03:50:09Z) - CLUSTSEG: Clustering for Universal Segmentation [56.58677563046506]
CLUSTSEG is a general, transformer-based framework for image segmentation.
It tackles different image segmentation tasks (i.e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.
arXiv Detail & Related papers (2023-05-03T15:31:16Z) - Image as Set of Points [60.30495338399321]
Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm.
Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction.
arXiv Detail & Related papers (2023-03-02T18:56:39Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Deep Transformation-Invariant Clustering [24.23117820167443]
We present an approach that does not rely on abstract features but instead learns to predict image transformations.
This learning process naturally fits in the gradient-based training of K-means and Gaussian mixture model.
We demonstrate that our novel approach yields competitive and highly promising results on standard image clustering benchmarks.
arXiv Detail & Related papers (2020-06-19T13:43:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.