Sparse Spatial Transformers for Few-Shot Learning
- URL: http://arxiv.org/abs/2109.12932v3
- Date: Wed, 10 May 2023 01:53:11 GMT
- Title: Sparse Spatial Transformers for Few-Shot Learning
- Authors: Haoxing Chen and Huaxiong Li and Yaohui Li and Chunlin Chen
- Abstract summary: Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model.
We propose a novel transformer-based neural network architecture called sparse spatial transformers.
Our method finds task-relevant features and suppresses task-irrelevant features.
- Score: 6.271261279657655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from limited data is challenging because data scarcity leads to a
poor generalization of the trained model. A classical global pooled
representation will probably lose useful local information. Many few-shot
learning methods have recently addressed this challenge using deep descriptors
and learning a pixel-level metric. However, using deep descriptors as feature
representations may lose image contextual information. Moreover, most of these
methods independently address each class in the support set, which cannot
sufficiently use discriminative information and task-specific embeddings. In
this paper, we propose a novel transformer-based neural network architecture
called sparse spatial transformers (SSFormers), which finds task-relevant
features and suppresses task-irrelevant features. Particularly, we first divide
each input image into several image patches of different sizes to obtain dense
local features. These features retain contextual information while expressing
local information. Then, a sparse spatial transformer layer is proposed to find
spatial correspondence between the query image and the full support set to
select task-relevant image patches and suppress task-irrelevant image patches.
Finally, we propose using an image patch-matching module to calculate the
distance between dense local representations, thus determining which category
the query image belongs to in the support set. Extensive experiments on popular
few-shot learning benchmarks demonstrate the superiority of our method over
state-of-the-art methods. Our source code is available at
\url{https://github.com/chenhaoxing/ssformers}.
Related papers
- Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain [0.0]
Few-shot segmentation is a task to segment objects or regions of novel classes within an image given only a few annotated examples.
We use SegGPT as our base model and train it on the base classes.
To handle various object sizes which typically present in remote sensing domain, we perform patch-based prediction.
arXiv Detail & Related papers (2024-04-16T06:33:08Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - TopicFM: Robust and Interpretable Feature Matching with Topic-assisted [8.314830611853168]
We propose an architecture for image matching which is efficient, robust, and interpretable.
We introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic.
Our method can only perform matching in co-visibility regions to reduce computations.
arXiv Detail & Related papers (2022-07-01T10:39:14Z) - Rethinking Generalization in Few-Shot Classification [28.809141478504532]
Single image-level annotations only correctly describe an often small subset of an image's content.
In this paper, we take a closer look at the implications in the context of $textitfew-shot learning$.
We build on recent advances in unsupervised training of networks via masked image modelling to overcome the lack of fine-grained labels.
arXiv Detail & Related papers (2022-06-15T03:05:21Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - One-Shot Image Classification by Learning to Restore Prototypes [11.448423413463916]
One-shot image classification aims to train image classifiers over the dataset with only one image per category.
For one-shot learning, the existing metric learning approaches would suffer poor performance because the single training image may not be representative of the class.
We propose a simple yet effective regression model, denoted by RestoreNet, which learns a class transformation on the image feature to move the image closer to the class center in the feature space.
arXiv Detail & Related papers (2020-05-04T02:11:30Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.