Mind the Gap Between Prototypes and Images in Cross-domain Finetuning
- URL: http://arxiv.org/abs/2410.12474v2
- Date: Sun, 20 Oct 2024 08:25:18 GMT
- Title: Mind the Gap Between Prototypes and Images in Cross-domain Finetuning
- Authors: Hongduan Tian, Feng Liu, Zhanke Zhou, Tongliang Liu, Chengqi Zhang, Bo Han,
- Abstract summary: We propose a contrastive prototype-image adaptation (CoPA) to adapt different transformations respectively for prototypes and images.
Experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently.
- Score: 64.97317635355124
- License:
- Abstract: In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations respectively for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve minimal validation loss at the enlarged gap.
Related papers
- Interpretable Image Classification with Adaptive Prototype-based Vision Transformers [37.62530032165594]
We present ProtoViT, a method for interpretable image classification combining deep learning and case-based reasoning.
Our model integrates Vision Transformer (ViT) backbones into prototype based models, while offering spatially deformed prototypes.
Our experiments show that our model can generally achieve higher performance than the existing prototype based models.
arXiv Detail & Related papers (2024-10-28T04:33:28Z) - Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images [12.365801596593936]
Medical image segmentation is one of the domains where sufficient annotated data is not available.
We propose a prototype-based self-supervised one-way one-shot learning framework using pseudo-labels generated from superpixels.
We show that the proposed simple but potent framework performs at par with the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T15:38:51Z) - Semi-supervised Semantic Segmentation with Prototype-based Consistency
Regularization [20.4183741427867]
Semi-supervised semantic segmentation requires the model to propagate the label information from limited annotated images to unlabeled ones.
A challenge for such a per-pixel prediction task is the large intra-class variation.
We propose a novel approach to regularize the distribution of within-class features to ease label propagation difficulty.
arXiv Detail & Related papers (2022-10-10T01:38:01Z) - PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation [53.428312630479816]
We observe that the Field of View (FoV) gap induces noticeable instance appearance differences between the source and target domains.
Motivated by the observations, we propose the textbfPosition-Invariant Transform (PIT) to better align images in different domains.
arXiv Detail & Related papers (2021-08-16T15:16:47Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z) - Prototype Mixture Models for Few-shot Semantic Segmentation [50.866870384596446]
Few-shot segmentation is challenging because objects within the support and query images could significantly differ in appearance and pose.
We propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation.
PMMs improve 5-shot segmentation performance on MS-COCO by up to 5.82% with only a moderate cost for model size and inference speed.
arXiv Detail & Related papers (2020-08-10T04:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.