Few-Shot Learning Meets Transformer: Unified Query-Support Transformers
for Few-Shot Classification
- URL: http://arxiv.org/abs/2208.12398v1
- Date: Fri, 26 Aug 2022 01:53:23 GMT
- Title: Few-Shot Learning Meets Transformer: Unified Query-Support Transformers
for Few-Shot Classification
- Authors: Xixi Wang, Xiao Wang, Bo Jiang, Bin Luo
- Abstract summary: Few-shot classification aims to recognize unseen classes using very limited samples.
In this paper, we show that the two challenges can be well modeled simultaneously via a unified Query-Support TransFormer model.
Experiments on four popular datasets demonstrate the effectiveness and superiority of the proposed QSFormer.
- Score: 16.757917001089762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot classification which aims to recognize unseen classes using very
limited samples has attracted more and more attention. Usually, it is
formulated as a metric learning problem. The core issue of few-shot
classification is how to learn (1) consistent representations for images in
both support and query sets and (2) effective metric learning for images
between support and query sets. In this paper, we show that the two challenges
can be well modeled simultaneously via a unified Query-Support TransFormer
(QSFormer) model. To be specific,the proposed QSFormer involves global
query-support sample Transformer (sampleFormer) branch and local patch
Transformer (patchFormer) learning branch. sampleFormer aims to capture the
dependence of samples in support and query sets for image representation. It
adopts the Encoder, Decoder and Cross-Attention to respectively model the
Support, Query (image) representation and Metric learning for few-shot
classification task. Also, as a complementary to global learning branch, we
adopt a local patch Transformer to extract structural representation for each
image sample by capturing the long-range dependence of local image patches. In
addition, a novel Cross-scale Interactive Feature Extractor (CIFE) is proposed
to extract and fuse multi-scale CNN features as an effective backbone module
for the proposed few-shot learning method. All modules are integrated into a
unified framework and trained in an end-to-end manner. Extensive experiments on
four popular datasets demonstrate the effectiveness and superiority of the
proposed QSFormer.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Few-shot Medical Image Segmentation via Cross-Reference Transformer [3.2634122554914]
Few-shot segmentation(FSS) has the potential to address these challenges by learning new categories from a small number of labeled samples.
We propose a novel self-supervised few shot medical image segmentation network with Cross-Reference Transformer.
Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.
arXiv Detail & Related papers (2023-04-19T13:05:18Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Enhancing Few-shot Image Classification with Cosine Transformer [4.511561231517167]
Few-shot Cosine Transformer (FS-CT) is a relational map between supports and queries.
Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks.
Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications.
arXiv Detail & Related papers (2022-11-13T06:03:28Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.