Matching Feature Sets for Few-Shot Image Classification
- URL: http://arxiv.org/abs/2204.00949v1
- Date: Sat, 2 Apr 2022 22:42:54 GMT
- Title: Matching Feature Sets for Few-Shot Image Classification
- Authors: Arman Afrasiyabi, Hugo Larochelle, Jean-Fran\c{c}ois Lalonde,
Christian Gagn\'e
- Abstract summary: We argue that a set-based representation intrinsically builds a richer representation of images from the base classes.
Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures.
- Score: 22.84472344406448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In image classification, it is common practice to train deep networks to
extract a single feature vector per input image. Few-shot classification
methods also mostly follow this trend. In this work, we depart from this
established direction and instead propose to extract sets of feature vectors
for each image. We argue that a set-based representation intrinsically builds a
richer representation of images from the base classes, which can subsequently
better transfer to the few-shot classes. To do so, we propose to adapt existing
feature extractors to instead produce sets of feature vectors from images. Our
approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside
existing encoder architectures. The attention modules are lightweight, and as
such our method results in encoders that have approximately the same number of
parameters as their original versions. During training and inference, a
set-to-set matching metric is used to perform image classification. The
effectiveness of our proposed architecture and metrics is demonstrated via
thorough experiments on standard few-shot datasets -- namely miniImageNet,
tieredImageNet, and CUB -- in both the 1- and 5-shot scenarios. In all cases
but one, our method outperforms the state-of-the-art.
Related papers
- Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification [1.6385815610837167]
Recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks.
We propose the Meta-Feature Adaption method (MF-Adapter) that combines the complementary strengths of both LRs and high-level semantic representations.
Our proposed method is superior to the state-of-the-art CLIP downstream few-shot classification methods, even showing stronger performance on a set of challenging visual classification tasks.
arXiv Detail & Related papers (2024-07-08T06:18:04Z) - Text Descriptions are Compressive and Invariant Representations for
Visual Learning [63.3464863723631]
We show that an alternative approach, in line with humans' understanding of multiple visual features per class, can provide compelling performance in the robust few-shot learning setting.
In particular, we introduce a novel method, textit SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors).
This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify
arXiv Detail & Related papers (2023-07-10T03:06:45Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Disambiguation of One-Shot Visual Classification Tasks: A Simplex-Based
Approach [8.436437583394998]
We present a strategy which aims at detecting the presence of multiple objects in a given shot.
This strategy is based on identifying the corners of a simplex in a high dimensional space.
We show the ability of the proposed method to slightly, yet statistically significantly, improve accuracy in extreme settings.
arXiv Detail & Related papers (2023-01-16T11:37:05Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Improving Few-shot Learning with Weakly-supervised Object Localization [24.3569501375842]
We propose a novel framework that generates class representations by extracting features from class-relevant regions of the images.
Our method outperforms the baseline few-shot model in miniImageNet and tieredImageNet benchmarks.
arXiv Detail & Related papers (2021-05-25T07:39:32Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Few-shot Image Classification: Just Use a Library of Pre-trained Feature
Extractors and a Simple Classifier [5.782827425991282]
We show that a library of pre-trained feature extractors combined with a simple feed-forward network learned with an L2-regularizer can be an excellent option for solving cross-domain few-shot image classification.
Our experimental results suggest that this simpler sample-efficient approach far outperforms several well-established meta-learning algorithms on a variety of few-shot tasks.
arXiv Detail & Related papers (2021-01-03T05:30:36Z) - One-Shot Image Classification by Learning to Restore Prototypes [11.448423413463916]
One-shot image classification aims to train image classifiers over the dataset with only one image per category.
For one-shot learning, the existing metric learning approaches would suffer poor performance because the single training image may not be representative of the class.
We propose a simple yet effective regression model, denoted by RestoreNet, which learns a class transformation on the image feature to move the image closer to the class center in the feature space.
arXiv Detail & Related papers (2020-05-04T02:11:30Z) - Cross-Domain Few-Shot Classification via Learned Feature-Wise
Transformation [109.89213619785676]
Few-shot classification aims to recognize novel categories with only few labeled images in each class.
Existing metric-based few-shot classification algorithms predict categories by comparing the feature embeddings of query images with those from a few labeled images.
While promising performance has been demonstrated, these methods often fail to generalize to unseen domains.
arXiv Detail & Related papers (2020-01-23T18:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.