SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning
- URL: http://arxiv.org/abs/2403.13684v2
- Date: Mon, 20 May 2024 05:12:08 GMT
- Title: SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning
- Authors: Hongjun Wang, Sagar Vaze, Kai Han,
- Abstract summary: Generalized Category Discovery (GCD) aims to classify unlabelled images from both seen' and unseen' classes.
We introduce a two-stage adaptation approach termed SPTNet, which iteratively optimize model parameters and data parameters.
We demonstrate that our method outperforms existing GCD methods on standard benchmarks.
- Score: 17.520137576423593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet.
Related papers
- Utilization of Neighbor Information for Image Classification with Different Levels of Supervision [38.350123625613804]
We propose a flexible method that performs well for both generalized category discovery (GCD) and image clustering.
Our method yields state-of-the-art results for both clustering (+3% ImageNet-100, Imagenet200) and GCD (+0.8% ImageNet-100, +5% CUB, +2% SCars, +4% Aircraft)
arXiv Detail & Related papers (2025-03-18T17:59:41Z) - Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification [13.238769012534922]
We propose a novel Cross-Modal Mapping (CMM) method for few-shot image classification.
CMM aligns image features with the text feature space through linear transformation.
It improves the average Top-1 accuracy by 1.06% on 11 benchmark datasets.
arXiv Detail & Related papers (2024-12-28T10:40:21Z) - Hyperspherical Classification with Dynamic Label-to-Prototype Assignment [5.978350039412277]
We present a simple yet effective method to optimize the category assigned to each prototype during the training.
We solve this optimization using a sequential combination of gradient descent and Bipartide matching.
Our method outperforms its competitors by 1.22% accuracy on CIFAR-100, and 2.15% on ImageNet-200 using a metric space dimension half of the size of its competitors.
arXiv Detail & Related papers (2024-03-25T17:01:34Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - PDiscoNet: Semantically consistent part discovery for fine-grained
recognition [62.12602920807109]
We propose PDiscoNet to discover object parts by using only image-level class labels along with priors encouraging the parts to be.
Our results on CUB, CelebA, and PartImageNet show that the proposed method provides substantially better part discovery performance than previous methods.
arXiv Detail & Related papers (2023-09-06T17:19:29Z) - Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models [37.574691902971296]
We propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models.
We show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k.
arXiv Detail & Related papers (2023-06-08T15:20:27Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Transferability Estimation using Bhattacharyya Class Separability [37.52588126267552]
Transfer learning is a popular method for leveraging pre-trained models in computer vision.
It is difficult to quantify which pre-trained source models are suitable for a specific target task.
We propose a novel method for quantifying transferability between a source model and a target dataset.
arXiv Detail & Related papers (2021-11-24T20:22:28Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Bayesian Few-Shot Classification with One-vs-Each P\'olya-Gamma
Augmented Gaussian Processes [7.6146285961466]
Few-shot classification (FSC) is an important step on the path toward human-like machine learning.
We propose a novel combination of P'olya-Gamma augmentation and the one-vs-each softmax approximation that allows us to efficiently marginalize over functions rather than model parameters.
We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.
arXiv Detail & Related papers (2020-07-20T19:10:41Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.