Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer
- URL: http://arxiv.org/abs/2001.01600v2
- Date: Sat, 8 Oct 2022 12:16:49 GMT
- Title: Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer
- Authors: Hongguang Zhang, Philip H. S. Torr, Piotr Koniusz
- Abstract summary: We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
- Score: 116.46533207849619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current few-shot learning models capture visual object relations in the
so-called meta-learning setting under a fixed-resolution input. However, such
models have a limited generalization ability under the scale and location
mismatch between objects, as only few samples from target classes are provided.
Therefore, the lack of a mechanism to match the scale and location between
pairs of compared images leads to the performance degradation. The importance
of image contents varies across coarse-to-fine scales depending on the object
and its class label, e.g., generic objects and scenes rely on their global
appearance while fine-grained objects rely more on their localized visual
patterns. In this paper, we study the impact of scale and location mismatch in
the few-shot learning scenario, and propose a novel Spatially-aware Matching
(SM) scheme to effectively perform matching across multiple scales and
locations, and learn image relations by giving the highest weights to the best
matching pairs. The SM is trained to activate the most related locations and
scales between support and query data. We apply and evaluate SM on various
few-shot learning models and backbones for comprehensive evaluations.
Furthermore, we leverage an auxiliary self-supervisory discriminator to
train/predict the spatial- and scale-level index of feature vectors we use.
Finally, we develop a novel transformer-based pipeline to exploit self- and
cross-attention in a spatially-aware matching process. Our proposed design is
orthogonal to the choice of backbone and/or comparator.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Contrastive Object-level Pre-training with Spatial Noise Curriculum
Learning [12.697842097171119]
We present a curriculum learning mechanism that adaptively augments the generated regions, which allows the model to consistently acquire a useful learning signal.
Our experiments show that our approach improves on the MoCo v2 baseline by a large margin on multiple object-level tasks when pre-training on multi-object scene image datasets.
arXiv Detail & Related papers (2021-11-26T18:29:57Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Multi-scale Adaptive Task Attention Network for Few-Shot Learning [5.861206243996454]
The goal of few-shot learning is to classify unseen categories with few labeled samples.
This paper proposes a novel Multi-scale Adaptive Task Attention Network (MATANet) for few-shot learning.
arXiv Detail & Related papers (2020-11-30T00:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.