M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data
- URL: http://arxiv.org/abs/2406.04230v1
- Date: Thu, 6 Jun 2024 16:30:41 GMT
- Title: M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data
- Authors: Matthew J Allen, Francisco Dorr, Joseph Alejandro Gallego Mejia, Laura Martínez-Ferrer, Anna Jungbluth, Freddie Kalaitzis, Raúl Ramos-Pollán,
- Abstract summary: M3LEO is a multi-modal, multi-label EO dataset that includes polarimetric, interferometric, and coherence SAR data.
It spans 17.5TB and contains approximately 10M data chips across six geographic regions.
We provide tools to process any dataset available on popular platforms for integration with our framework.
- Score: 1.4053129774629076
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Satellite-based remote sensing has revolutionised the way we address global challenges in a rapidly evolving world. Huge quantities of Earth Observation (EO) data are generated by satellite sensors daily, but processing these large datasets for use in ML pipelines is technically and computationally challenging. Specifically, different types of EO data are often hosted on a variety of platforms, with differing availability for Python preprocessing tools. In addition, spatial alignment across data sources and data tiling can present significant technical hurdles for novice users. While some preprocessed EO datasets exist, their content is often limited to optical or near-optical wavelength data, which is ineffective at night or in adverse weather conditions. Synthetic Aperture Radar (SAR), an active sensing technique based on microwave length radiation, offers a viable alternative. However, the application of machine learning to SAR has been limited due to a lack of ML-ready data and pipelines, particularly for the full diversity of SAR data, including polarimetry, coherence and interferometry. We introduce M3LEO, a multi-modal, multi-label EO dataset that includes polarimetric, interferometric, and coherence SAR data derived from Sentinel-1, alongside Sentinel-2 RGB imagery and a suite of labelled tasks for model evaluation. M3LEO spans 17.5TB and contains approximately 10M data chips across six geographic regions. The dataset is complemented by a flexible PyTorch Lightning framework, with configuration management using Hydra. We provide tools to process any dataset available on popular platforms such as Google Earth Engine for integration with our framework. Initial experiments validate the utility of our data and framework, showing that SAR imagery contains information additional to that extractable from RGB data. Data at huggingface.co/M3LEO, and code at github.com/spaceml-org/M3LEO.
Related papers
- SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving [0.0]
We present a novel synthetically generated multi-modal dataset, SCaRL, to enable the training and validation of autonomous driving solutions.
SCaRL is a large dataset based on the CARLA Simulator, which provides data for diverse, dynamic scenarios and traffic conditions.
arXiv Detail & Related papers (2024-05-27T10:31:26Z) - DIDLM:A Comprehensive Multi-Sensor Dataset with Infrared Cameras, Depth Cameras, LiDAR, and 4D Millimeter-Wave Radar in Challenging Scenarios for 3D Mapping [7.050468075029598]
This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments.
The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar.
Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios.
arXiv Detail & Related papers (2024-04-15T09:49:33Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale
SAR Object Detection [83.21028626585986]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.
Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI [3.4764766275808583]
Cloud formations often obscure optical satellite-based monitoring of the Earth's surface.
We propose a novel synthetic dataset for cloud optical thickness estimation.
We leverage for obtaining reliable and versatile cloud masks on real data.
arXiv Detail & Related papers (2023-11-23T14:28:28Z) - Multimodal Dataset for Localization, Mapping and Crop Monitoring in
Citrus Tree Farms [7.666806082770633]
The dataset offers stereo RGB images with depth information, as well as monochrome, near-infrared and thermal images.
The dataset comprises seven sequences collected in three fields of citrus trees.
It spans a total operation time of 1.7 hours, covers a distance of 7.5 km, and constitutes 1.3 TB of data.
arXiv Detail & Related papers (2023-09-27T00:30:08Z) - Diffusion Models for Interferometric Satellite Aperture Radar [73.01013149014865]
Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models.
Here, we leverage PDMs to generate several radar-based satellite image datasets.
We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue.
arXiv Detail & Related papers (2023-08-31T16:26:17Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol
Particles for Frontier Exploration [55.41644538483948]
This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles.
It contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format.
The focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data.
arXiv Detail & Related papers (2023-04-27T20:21:18Z) - Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z) - Multimodal Remote Sensing Benchmark Datasets for Land Cover
Classification with A Shared and Specific Feature Learning Model [36.993630058695345]
We propose a shared and specific feature learning (S2FL) model to decomposing multimodal RS data into modality-shared and modality-specific components.
To better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 -- hyperspectral and multispectral data, Berlin -- hyperspectral and synthetic aperture radar (SAR) data, Augsburg -- hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification.
arXiv Detail & Related papers (2021-05-21T08:14:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.