Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition
- URL: http://arxiv.org/abs/2107.07746v1
- Date: Fri, 16 Jul 2021 07:46:41 GMT
- Title: Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition
- Authors: Xu Luo, Longhui Wei, Liangjian Wen, Jinrong Yang, Lingxi Xie, Zenglin
Xu, Qi Tian
- Abstract summary: Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
- Score: 101.59989523028264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-Shot image classification aims to utilize pretrained knowledge learned
from a large-scale dataset to tackle a series of downstream classification
tasks. Typically, each task involves only few training examples from brand-new
categories. This requires the pretraining models to focus on well-generalizable
knowledge, but ignore domain-specific information. In this paper, we observe
that image background serves as a source of domain-specific knowledge, which is
a shortcut for models to learn in the source dataset, but is harmful when
adapting to brand-new classes. To prevent the model from learning this shortcut
knowledge, we propose COSOC, a novel Few-Shot Learning framework, to
automatically figure out foreground objects at both pretraining and evaluation
stage. COSOC is a two-stage algorithm motivated by the observation that
foreground objects from different images within the same class share more
similar patterns than backgrounds. At the pretraining stage, for each class, we
cluster contrastive-pretrained features of randomly cropped image patches, such
that crops containing only foreground objects can be identified by a single
cluster. We then force the pretraining model to focus on found foreground
objects by a fusion sampling strategy; at the evaluation stage, among images in
each training class of any few-shot task, we seek for shared contents and
filter out background. The recognized foreground objects of each class are used
to match foreground of testing images. Extensive experiments tailored to
inductive FSL tasks on two benchmarks demonstrate the state-of-the-art
performance of our method.
Related papers
- Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - Neural Congealing: Aligning Images to a Joint Semantic Atlas [14.348512536556413]
We present a zero-shot self-supervised framework for aligning semantically-common content across a set of images.
Our approach harnesses the power of pre-trained DINO-ViT features to learn.
We show that our method performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.
arXiv Detail & Related papers (2023-02-08T09:26:22Z) - Few-shot Open-set Recognition Using Background as Unknowns [58.04165813493666]
Few-shot open-set recognition aims to classify both seen and novel images given only limited training data of seen classes.
Our proposed method not only outperforms multiple baselines but also sets new results on three popular benchmarks.
arXiv Detail & Related papers (2022-07-19T04:19:29Z) - Learning to Detect Every Thing in an Open World [139.78830329914135]
We propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET)
To avoid suppressing hidden objects, background objects that are visible but unlabeled, we paste annotated objects on a background image sampled from a small region of the original image.
LDET leads to significant improvements on many datasets in the open world instance segmentation task.
arXiv Detail & Related papers (2021-12-03T03:56:06Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.