MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery
- URL: http://arxiv.org/abs/2509.08027v2
- Date: Thu, 06 Nov 2025 07:50:07 GMT
- Title: MCTED: A Machine-Learning-Ready Dataset for Digital Elevation Model Generation From Mars Imagery
- Authors: Rafał Osadnik, Pablo Gómez, Eleni Bohacek, Rickbir Bahia,
- Abstract summary: This work presents a new dataset for the Martian digital elevation model prediction task, ready for machine learning applications called MCTED.<n>The dataset has been generated using a comprehensive pipeline designed to process high-resolution Mars orthoimage and DEM pairs from Day et al.<n>The source images are data gathered by the Mars Reconnaissance Orbiter using the CTX instrument, providing a very diverse and comprehensive coverage of the Martian surface.
- Score: 1.7499351967216341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents a new dataset for the Martian digital elevation model prediction task, ready for machine learning applications called MCTED. The dataset has been generated using a comprehensive pipeline designed to process high-resolution Mars orthoimage and DEM pairs from Day et al., yielding a dataset consisting of 80,898 data samples. The source images are data gathered by the Mars Reconnaissance Orbiter using the CTX instrument, providing a very diverse and comprehensive coverage of the Martian surface. Given the complexity of the processing pipelines used in large-scale DEMs, there are often artefacts and missing data points in the original data, for which we developed tools to solve or mitigate their impact. We divide the processed samples into training and validation splits, ensuring samples in both splits cover no mutual areas to avoid data leakage. Every sample in the dataset is represented by the optical image patch, DEM patch, and two mask patches, indicating values that were originally missing or were altered by us. This allows future users of the dataset to handle altered elevation regions as they please. We provide statistical insights of the generated dataset, including the spatial distribution of samples, the distributions of elevation values, slopes and more. Finally, we train a small U-Net architecture on the MCTED dataset and compare its performance to a monocular depth estimation foundation model, DepthAnythingV2, on the task of elevation prediction. We find that even a very small architecture trained on this dataset specifically, beats a zero-shot performance of a depth estimation foundation model like DepthAnythingV2. We make the dataset and code used for its generation completely open source in public repositories.
Related papers
- TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data [3.674991996196602]
TerraMesh is a new globally diverse, multimodal dataset combining optical, radar, elevation, aperture and land-ready modalities in a Data-Ready format.<n>We provide detailed data processing steps, comprehensive statistics, and empirical evidence demonstrating improved model performance when pre-trained on TerraMesh.
arXiv Detail & Related papers (2025-04-15T13:20:35Z) - The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation [0.8875650122536799]
Bi-temporal change detection at scale based on Very High Resolution (VHR) images is crucial for Earth monitoring.<n>Most approaches do not exhibit the versatility required for temporal and spatial adaptation.<n>We present HySCDG, a generative pipeline for creating a large hybrid semantic change detection dataset.
arXiv Detail & Related papers (2025-03-19T20:32:37Z) - EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z) - MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion [25.44529512862336]
MASSTAR is a multi-modal lArge-scale scene dataset with a verSatile Toolchain for surfAce pRediction and completion.
We develop a versatile and efficient toolchain for processing the raw 3D data from the environments.
We generate an example dataset composed of over a thousand scene-level models with partial real-world data.
arXiv Detail & Related papers (2024-03-18T11:35:18Z) - MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based
Self-Supervised Pre-Training [58.07391711548269]
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z) - Sketch and Scale: Geo-distributed tSNE and UMAP [75.44887265789056]
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem.
We introduce a novel framework: Sketch and Scale (SnS)
It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary.
We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe.
arXiv Detail & Related papers (2020-11-11T22:32:21Z) - Exploring the Impacts from Datasets to Monocular Depth Estimation (MDE)
Models with MineNavi [5.689127984415125]
Current computer vision tasks based on deep learning require a huge amount of data with annotations for model training or testing.
In practice, manual labeling for dense estimation tasks is very difficult or even impossible, and the scenes of the dataset are often restricted to a small range.
We propose a synthetic dataset generation method to obtain the expandable dataset without burdensome manual workforce.
arXiv Detail & Related papers (2020-08-19T14:03:17Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.