Multimodal contrastive learning for remote sensing tasks
- URL: http://arxiv.org/abs/2209.02329v1
- Date: Tue, 6 Sep 2022 09:31:45 GMT
- Title: Multimodal contrastive learning for remote sensing tasks
- Authors: Umangi Jain, Alex Wilson, Varun Gulshan
- Abstract summary: We propose a dual-encoder framework, which is pre-trained on a large unlabeled dataset (1M) of Sentinel-1 and Sentinel-2 image pairs.
We test the embeddings on two remote sensing downstream tasks: flood segmentation and land cover mapping, and empirically show that embeddings learnt from this technique outperform the conventional technique of collecting positive examples via aggressive data augmentations.
- Score: 0.5801044612920815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised methods have shown tremendous success in the field of
computer vision, including applications in remote sensing and medical imaging.
Most popular contrastive-loss based methods like SimCLR, MoCo, MoCo-v2 use
multiple views of the same image by applying contrived augmentations on the
image to create positive pairs and contrast them with negative examples.
Although these techniques work well, most of these techniques have been tuned
on ImageNet (and similar computer vision datasets). While there have been some
attempts to capture a richer set of deformations in the positive samples, in
this work, we explore a promising alternative to generating positive examples
for remote sensing data within the contrastive learning framework. Images
captured from different sensors at the same location and nearby timestamps can
be thought of as strongly augmented instances of the same scene, thus removing
the need to explore and tune a set of hand crafted strong augmentations. In
this paper, we propose a simple dual-encoder framework, which is pre-trained on
a large unlabeled dataset (~1M) of Sentinel-1 and Sentinel-2 image pairs. We
test the embeddings on two remote sensing downstream tasks: flood segmentation
and land cover mapping, and empirically show that embeddings learnt from this
technique outperform the conventional technique of collecting positive examples
via aggressive data augmentations.
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Residual Relaxation for Multi-view Representation Learning [64.40142301026805]
Multi-view methods learn by aligning multiple views of the same image.
Some useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift.
We develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment.
arXiv Detail & Related papers (2021-10-28T17:57:17Z) - Multi-Level Contrastive Learning for Few-Shot Problems [7.695214001809138]
Contrastive learning is a discriminative approach that aims at grouping similar samples closer and diverse samples far from each other.
We propose a multi-level contrasitive learning approach which applies contrastive losses at different layers of an encoder to learn multiple representations from the encoder.
arXiv Detail & Related papers (2021-07-15T21:00:02Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Multi-view Contrastive Coding of Remote Sensing Images at Pixel-level [5.64497799927668]
A pixel-wise contrastive approach based on an unlabeled multi-view setting is proposed to overcome this limitation.
A pseudo-Siamese ResUnet is trained to learn a representation that aims to align features from the shifted positive pairs.
Results demonstrate both improvements in efficiency and accuracy over the state-of-the-art multi-view contrastive methods.
arXiv Detail & Related papers (2021-05-18T13:28:46Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation
Learning [108.999497144296]
Recently advanced unsupervised learning approaches use the siamese-like framework to compare two "views" from the same image for learning representations.
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs.
Despite its conceptual simplicity, we show empirically that with the solution -- Unsupervised image mixtures (Un-Mix), we can learn subtler, more robust and generalized representations from the transformed input and corresponding new label space.
arXiv Detail & Related papers (2020-03-11T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.