Multi-level Cross-modal Feature Alignment via Contrastive Learning
towards Zero-shot Classification of Remote Sensing Image Scenes
- URL: http://arxiv.org/abs/2306.06066v1
- Date: Wed, 31 May 2023 10:00:45 GMT
- Title: Multi-level Cross-modal Feature Alignment via Contrastive Learning
towards Zero-shot Classification of Remote Sensing Image Scenes
- Authors: Chun Liu, Suqiang Ma, Zheng Li, Wei Yang and Zhigang Han
- Abstract summary: Cross-modal feature alignment methods have been proposed to address the zero-shot image scene classification.
We propose a multi-level cross-modal feature alignment method via contrastive learning for zero-shot classification of remote sensing image scenes.
Our proposed method outperforms state of the art methods for zero-shot remote sensing image scene classification.
- Score: 7.17717863134783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot classification of image scenes which can recognize the image scenes
that are not seen in the training stage holds great promise of lowering the
dependence on large numbers of labeled samples. To address the zero-shot image
scene classification, the cross-modal feature alignment methods have been
proposed in recent years. These methods mainly focus on matching the visual
features of each image scene with their corresponding semantic descriptors in
the latent space. Less attention has been paid to the contrastive relationships
between different image scenes and different semantic descriptors. In light of
the challenge of large intra-class difference and inter-class similarity among
image scenes and the potential noisy samples, these methods are susceptible to
the influence of the instances which are far from these of the same classes and
close to these of other classes. In this work, we propose a multi-level
cross-modal feature alignment method via contrastive learning for zero-shot
classification of remote sensing image scenes. While promoting the
single-instance level positive alignment between each image scene with their
corresponding semantic descriptors, the proposed method takes the
cross-instance contrastive relationships into consideration,and learns to keep
the visual and semantic features of different classes in the latent space apart
from each other. Extensive experiments have been done to evaluate the
performance of the proposed method. The results show that our proposed method
outperforms state of the art methods for zero-shot remote sensing image scene
classification. All the code and data are available at github
https://github.com/masuqiang/MCFA-Pytorch
Related papers
- Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images.
Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features.
We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z) - SGMNet: Scene Graph Matching Network for Few-Shot Remote Sensing Scene
Classification [14.016637774748677]
Few-Shot Remote Sensing Scene Classification (FSRSSC) is an important task, which aims to recognize novel scene classes with few examples.
We propose a novel scene graph matching-based meta-learning framework for FSRSSC, called SGMNet.
We conduct extensive experiments on UCMerced LandUse, WHU19, AID, and NWPU-RESISC45 datasets.
arXiv Detail & Related papers (2021-10-09T07:43:40Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z) - Realizing Pixel-Level Semantic Learning in Complex Driving Scenes based
on Only One Annotated Pixel per Class [17.481116352112682]
We propose a new semantic segmentation task under complex driving scenes based on weakly supervised condition.
A three step process is built for pseudo labels generation, which progressively implement optimal feature representation for each category.
Experiments on Cityscapes dataset demonstrate that the proposed method provides a feasible way to solve weakly supervised semantic segmentation task.
arXiv Detail & Related papers (2020-03-10T12:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.