Related papers: WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

URL: http://arxiv.org/abs/2406.18765v1
Date: Wed, 26 Jun 2024 21:30:41 GMT
Title: WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
Authors: Yannik Glaser, Justin E. Stopa, Linnea M. Wolniewicz, Ralph Foster, Doug Vandemark, Alexis Mouche, Bertrand Chapron, Peter Sadowski,
Abstract summary: This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images with supervised learning. WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings.
Score: 23.653151006898327
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range of applications, but the need for manual image annotations is a bottleneck that hinders the use of machine learning methods. This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images (ImageNet) with supervised learning. Experiments show improvements for estimating wave height (0.50 vs 0.60 RMSE using linear probing), estimating near-surface air temperature (0.90 vs 0.97 RMSE), and performing multilabel-classification of geophysical and atmospheric phenomena (0.96 vs 0.95 micro-averaged AUROC). WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings. Together, these results demonstrate that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks.

Related papers

Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO) In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery. We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z)
Enhancing Robustness of Human Detection Algorithms in Maritime SAR through Augmented Aerial Images to Simulate Weather Conditions [1.660242118349614]
This paper aims to improve the model's accuracy of human detection in maritime SAR by evaluating a robust datasets containing various elevations and geological locations. We observed that models trained on augmented datasets outperformed their non-augmented counterparts in which the human recall scores ranged from 0.891 to 0.911 with an improvement rate of 3.4% on the YOLOv5l model.
arXiv Detail & Related papers (2024-08-25T08:23:06Z)
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm. Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z)
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z)
Semantic Segmentation in Satellite Hyperspectral Imagery by Deep Learning [54.094272065609815]
We propose a lightweight 1D-CNN model, 1D-Justo-LiuNet, which outperforms state-of-the-art models in the hypespectral domain. 1D-Justo-LiuNet achieves the highest accuracy (0.93) with the smallest model size (4,563 parameters) among all tested models.
arXiv Detail & Related papers (2023-10-24T21:57:59Z)
Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning. We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z)
Transforming Observations of Ocean Temperature with a Deep Convolutional Residual Regressive Neural Network [0.0]
Sea surface temperature (SST) is an essential climate variable that can be measured via ground truth, remote sensing, or hybrid model methodologies. Here, we celebrate SST surveillance progress via the application of a few relevant technological advances from the late 20th and early 21st century. We develop our existing water cycle observation framework, Flux to Flow (F2F), to fuse AMSR-E and MODIS into a higher resolution product. Our neural network architecture is constrained to a deep convolutional residual regressive neural network.
arXiv Detail & Related papers (2023-06-16T17:35:11Z)
Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition. Specifically, we utilize the web-collected Coyo-700M dataset. Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z)
Recognition of polar lows in Sentinel-1 SAR images with deep learning [5.571369922847262]
We introduce a novel dataset consisting of Sentinel-1 images labeled as positive; representing a maritime mesocyclone, or negative; representing a normal sea state. The dataset is used to train a deep learning model to classify the labeled images. The model yields an F-1 score of 0.95, indicating that polar lows can be consistently detected from SAR images.
arXiv Detail & Related papers (2022-03-30T15:32:39Z)
Planetary UAV localization based on Multi-modal Registration with Pre-existing Digital Terrain Model [0.5156484100374058]
We propose a multi-modal registration based SLAM algorithm, which estimates the location of a planet UAV using a nadir view camera on the UAV. To overcome the scale and appearance difference between on-board UAV images and pre-installed digital terrain model, a theoretical model is proposed to prove that topographic features of UAV image and DEM can be correlated in frequency domain via cross power spectrum. To test the robustness and effectiveness of the proposed localization algorithm, a new cross-source drone-based localization dataset for planetary exploration is proposed.
arXiv Detail & Related papers (2021-06-24T02:54:01Z)
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.