ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving
Few-Shot Learning
- URL: http://arxiv.org/abs/2304.13287v1
- Date: Wed, 26 Apr 2023 04:52:08 GMT
- Title: ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving
Few-Shot Learning
- Authors: Yi Rong, Xiongbo Lu, Zhaoyang Sun, Yaxiong Chen, Shengwu Xiong
- Abstract summary: We propose to augment the few-shot learning objective with a novel self-supervised Episodic Spatial Pretext Task (ESPT)
Our ESPT objective is defined as maximizing the local spatial relationship consistency between the original episode and the transformed one.
Our ESPT method achieves new state-of-the-art performance for few-shot image classification on three mainstay benchmark datasets.
- Score: 16.859375666701
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) techniques have recently been integrated into
the few-shot learning (FSL) framework and have shown promising results in
improving the few-shot image classification performance. However, existing SSL
approaches used in FSL typically seek the supervision signals from the global
embedding of every single image. Therefore, during the episodic training of
FSL, these methods cannot capture and fully utilize the local visual
information in image samples and the data structure information of the whole
episode, which are beneficial to FSL. To this end, we propose to augment the
few-shot learning objective with a novel self-supervised Episodic Spatial
Pretext Task (ESPT). Specifically, for each few-shot episode, we generate its
corresponding transformed episode by applying a random geometric transformation
to all the images in it. Based on these, our ESPT objective is defined as
maximizing the local spatial relationship consistency between the original
episode and the transformed one. With this definition, the ESPT-augmented FSL
objective promotes learning more transferable feature representations that
capture the local spatial features of different images and their
inter-relational structural information in each input episode, thus enabling
the model to generalize better to new categories with only a few samples.
Extensive experiments indicate that our ESPT method achieves new
state-of-the-art performance for few-shot image classification on three
mainstay benchmark datasets. The source code will be available at:
https://github.com/Whut-YiRong/ESPT.
Related papers
- Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene
Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training.
Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes.
We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Semantic Cross Attention for Few-shot Learning [9.529264466445236]
We propose a multi-task learning approach to view semantic features of label text as an auxiliary task.
Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal.
arXiv Detail & Related papers (2022-10-12T15:24:59Z) - Object discovery and representation networks [78.16003886427885]
We propose a self-supervised learning paradigm that discovers the structure encoded in priors by itself.
Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision.
arXiv Detail & Related papers (2022-03-16T17:42:55Z) - Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain
Few-Shot Learning [95.78635058475439]
Cross-domain few-shot learning aims at transferring knowledge from general nature images to novel domain-specific target categories.
This paper studies the problem of CD-FSL by spanning the style distributions of the source dataset.
To make our model robust to visual styles, the source images are augmented by swapping the styles of their low-frequency components with each other.
arXiv Detail & Related papers (2022-03-15T05:36:41Z) - Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning [11.66422653137002]
We propose an attention-based model in the problem settings of Zero-Shot Learning to learn attributes useful for unseen class recognition.
Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches.
arXiv Detail & Related papers (2021-07-30T19:08:44Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z) - AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning [112.95742995816367]
We propose a new few-shot fewshot learning setting termed FSFSL.
Under FSFSL, both the source and target classes have limited training samples.
We also propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove irrelevant images.
arXiv Detail & Related papers (2020-02-28T10:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.