Rethinking Generalization in Few-Shot Classification
- URL: http://arxiv.org/abs/2206.07267v1
- Date: Wed, 15 Jun 2022 03:05:21 GMT
- Title: Rethinking Generalization in Few-Shot Classification
- Authors: Markus Hiller, Rongkai Ma, Mehrtash Harandi, Tom Drummond
- Abstract summary: Single image-level annotations only correctly describe an often small subset of an image's content.
In this paper, we take a closer look at the implications in the context of $textitfew-shot learning$.
We build on recent advances in unsupervised training of networks via masked image modelling to overcome the lack of fine-grained labels.
- Score: 28.809141478504532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single image-level annotations only correctly describe an often small subset
of an image's content, particularly when complex real-world scenes are
depicted. While this might be acceptable in many classification scenarios, it
poses a significant challenge for applications where the set of classes differs
significantly between training and test time. In this paper, we take a closer
look at the implications in the context of $\textit{few-shot learning}$.
Splitting the input samples into patches and encoding these via the help of
Vision Transformers allows us to establish semantic correspondences between
local regions across images and independent of their respective class. The most
informative patch embeddings for the task at hand are then determined as a
function of the support set via online optimization at inference time,
additionally providing visual interpretability of `$\textit{what matters
most}$' in the image. We build on recent advances in unsupervised training of
networks via masked image modelling to overcome the lack of fine-grained labels
and learn the more general statistical structure of the data while avoiding
negative image-level annotation influence, $\textit{aka}$ supervision collapse.
Experimental results show the competitiveness of our approach, achieving new
state-of-the-art results on four popular few-shot classification benchmarks for
$5$-shot and $1$-shot scenarios.
Related papers
- Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain [0.0]
Few-shot segmentation is a task to segment objects or regions of novel classes within an image given only a few annotated examples.
We use SegGPT as our base model and train it on the base classes.
To handle various object sizes which typically present in remote sensing domain, we perform patch-based prediction.
arXiv Detail & Related papers (2024-04-16T06:33:08Z) - Improving fine-grained understanding in image-text pre-training [37.163228122323865]
We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs.
We show improved performance over competing approaches over both image-level tasks relying on coarse-grained information.
arXiv Detail & Related papers (2024-01-18T10:28:45Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Sparse Spatial Transformers for Few-Shot Learning [6.271261279657655]
Learning from limited data is challenging because data scarcity leads to a poor generalization of the trained model.
We propose a novel transformer-based neural network architecture called sparse spatial transformers.
Our method finds task-relevant features and suppresses task-irrelevant features.
arXiv Detail & Related papers (2021-09-27T10:36:32Z) - Mixed Supervision Learning for Whole Slide Image Classification [88.31842052998319]
We propose a mixed supervision learning framework for super high-resolution images.
During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning.
A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives.
arXiv Detail & Related papers (2021-07-02T09:46:06Z) - Contrastive Semantic Similarity Learning for Image Captioning Evaluation
with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning.
We develop three progressive model structures to learn the sentence level representations.
Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z) - Few-Shot Semantic Segmentation Augmented with Image-Level Weak
Annotations [23.02986307143718]
Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples.
Our key idea is to learn a better prototype representation of the class by fusing the knowledge from the image-level labeled data.
We propose a new framework, called PAIA, to learn the class prototype representation in a metric space by integrating image-level annotations.
arXiv Detail & Related papers (2020-07-03T04:58:20Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.