WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
- URL: http://arxiv.org/abs/2406.18765v1
- Date: Wed, 26 Jun 2024 21:30:41 GMT
- Title: WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
- Authors: Yannik Glaser, Justin E. Stopa, Linnea M. Wolniewicz, Ralph Foster, Doug Vandemark, Alexis Mouche, Bertrand Chapron, Peter Sadowski,
- Abstract summary: This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net.
In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images with supervised learning.
WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings.
- Score: 23.653151006898327
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range of applications, but the need for manual image annotations is a bottleneck that hinders the use of machine learning methods. This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images (ImageNet) with supervised learning. Experiments show improvements for estimating wave height (0.50 vs 0.60 RMSE using linear probing), estimating near-surface air temperature (0.90 vs 0.97 RMSE), and performing multilabel-classification of geophysical and atmospheric phenomena (0.96 vs 0.95 micro-averaged AUROC). WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings. Together, these results demonstrate that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks.
Related papers
- An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features.
In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm.
Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z) - From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection
with Super Resolution [4.107182710549721]
We present an innovative approach that combines super-resolution and an adapted lightweight YOLOv5 architecture.
Our experimental results demonstrate the model's superior performance in detecting small and densely clustered objects.
arXiv Detail & Related papers (2024-01-26T05:50:58Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - Estimating optical vegetation indices with Sentinel-1 SAR data and
AutoML [32.19783248549554]
Current optical vegetation indices (VIs) for monitoring forest ecosystems are widely used in various applications.
continuous monitoring based on optical satellite data can be hampered by atmospheric effects such as clouds.
The goal of this work is to overcome the issues affecting optical data with SAR data and serve as a substitute for estimating optical VIs for forests using machine learning.
arXiv Detail & Related papers (2023-11-13T18:23:46Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Transforming Observations of Ocean Temperature with a Deep Convolutional
Residual Regressive Neural Network [0.0]
Sea surface temperature (SST) is an essential climate variable that can be measured via ground truth, remote sensing, or hybrid model methodologies.
Here, we celebrate SST surveillance progress via the application of a few relevant technological advances from the late 20th and early 21st century.
We develop our existing water cycle observation framework, Flux to Flow (F2F), to fuse AMSR-E and MODIS into a higher resolution product.
Our neural network architecture is constrained to a deep convolutional residual regressive neural network.
arXiv Detail & Related papers (2023-06-16T17:35:11Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - Recognition of polar lows in Sentinel-1 SAR images with deep learning [5.571369922847262]
We introduce a novel dataset consisting of Sentinel-1 images labeled as positive; representing a maritime mesocyclone, or negative; representing a normal sea state.
The dataset is used to train a deep learning model to classify the labeled images.
The model yields an F-1 score of 0.95, indicating that polar lows can be consistently detected from SAR images.
arXiv Detail & Related papers (2022-03-30T15:32:39Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Planetary UAV localization based on Multi-modal Registration with
Pre-existing Digital Terrain Model [0.5156484100374058]
We propose a multi-modal registration based SLAM algorithm, which estimates the location of a planet UAV using a nadir view camera on the UAV.
To overcome the scale and appearance difference between on-board UAV images and pre-installed digital terrain model, a theoretical model is proposed to prove that topographic features of UAV image and DEM can be correlated in frequency domain via cross power spectrum.
To test the robustness and effectiveness of the proposed localization algorithm, a new cross-source drone-based localization dataset for planetary exploration is proposed.
arXiv Detail & Related papers (2021-06-24T02:54:01Z) - SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.
We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.