Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition
- URL: http://arxiv.org/abs/2101.05018v1
- Date: Wed, 13 Jan 2021 11:37:28 GMT
- Title: Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition
- Authors: Mengting Chen and Xinggang Wang and Heng Luo and Yifeng Geng and Wenyu
Liu
- Abstract summary: Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
- Score: 38.49419948988415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep networks can learn to accurately recognize objects of a category by
training on a large number of annotated images. However, a meta-learning
challenge known as a low-shot image recognition task comes when only a few
images with annotations are available for learning a recognition model for one
category. The objects in testing/query and training/support images are likely
to be different in size, location, style, and so on. Our method, called
Cascaded Feature Matching Network (CFMN), is proposed to solve this problem. We
train the meta-learner to learn a more fine-grained and adaptive deep distance
metric by focusing more on the features that have high correlations between
compared images by the feature matching block which can align associated
features together and naturally ignore those non-discriminative features. By
applying the proposed feature matching block in different layers of the
few-shot recognition network, multi-scale information among the compared images
can be incorporated into the final cascaded matching feature, which boosts the
recognition performance further and generalizes better by learning on
relationships. The experiments for few-shot learning on two standard datasets,
\emph{mini}ImageNet and Omniglot, have confirmed the effectiveness of our
method. Besides, the multi-label few-shot task is first studied on a new data
split of COCO which further shows the superiority of the proposed feature
matching network when performing few-shot learning in complex images. The code
will be made publicly available.
Related papers
- Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Learning Discriminative Representations for Multi-Label Image
Recognition [13.13795708478267]
We propose a unified deep network to learn discriminative features for the multi-label task.
By regularizing the whole network with the proposed loss, the performance of applying the wellknown ResNet-101 is improved significantly.
arXiv Detail & Related papers (2021-07-23T12:10:46Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Augmented Bi-path Network for Few-shot Learning [16.353228724916505]
We propose Augmented Bi-path Network (ABNet) for learning to compare both global and local features on multi-scales.
Specifically, the salient patches are extracted and embedded as the local features for every image. Then, the model learns to augment the features for better robustness.
arXiv Detail & Related papers (2020-07-15T11:13:38Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.