Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges
- URL: http://arxiv.org/abs/2503.22060v1
- Date: Fri, 28 Mar 2025 00:46:55 GMT
- Title: Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges
- Authors: Ukcheol Shin, Jinsun Park,
- Abstract summary: This manuscript provides a large-scale Multi-Spectral Stereo (MS$2$) dataset that consists of stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and semi-dense depth ground truth.<n>MS$2$ dataset includes 162K synchronized multi-modal data pairs captured across diverse locations.
- Score: 5.946838062187346
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera (i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. To this end, this manuscript provides a large-scale Multi-Spectral Stereo (MS$^2$) dataset that consists of stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GNSS/IMU information along with semi-dense depth ground truth. MS$^2$ dataset includes 162K synchronized multi-modal data pairs captured across diverse locations (e.g., urban city, residential area, campus, and high-way road) at different times (e.g., morning, daytime, and nighttime) and under various weather conditions (e.g., clear-sky, cloudy, and rainy). Secondly, we conduct a thorough evaluation of monocular and stereo depth estimation networks across RGB, NIR, and thermal modalities to establish standardized benchmark results on MS$^2$ depth test sets (e.g., day, night, and rainy). Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. Our dataset and source code are publicly available at https://sites.google.com/view/multi-spectral-stereo-dataset and https://github.com/UkcheolShin/SupDepth4Thermal.
Related papers
- RASMD: RGB And SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions [0.3141085922386211]
Short-wave infrared (SWIR) imaging offers several advantages over NIR and LWIR.
Current autonomous driving algorithms heavily rely on the visible spectrum, which is prone to performance degradation in adverse conditions.
We introduce the RGB and SWIR Multispectral Driving dataset, which comprises 100,000 synchronized and spatially aligned RGB-SWIR image pairs.
arXiv Detail & Related papers (2025-04-10T09:54:57Z) - Multi-modal Multi-platform Person Re-Identification: Benchmark and Method [58.59888754340054]
MP-ReID is a novel dataset designed specifically for multi-modality and multi-platform ReID.<n>This benchmark compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging.<n>We introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios.
arXiv Detail & Related papers (2025-03-21T12:27:49Z) - RTFusion: A depth estimation network based on multimodal fusion in challenging scenarios [0.0]
This paper proposes a novel multimodal depth estimation model, RTFusion, which enhances depth estimation accuracy and robustness.<n>The model incorporates a unique fusion mechanism, EGFusion, consisting of the Mutual Complementary Attention (MCA) module for cross-modal feature alignment.<n>Experiments on the MS2 and ViViD++ datasets demonstrate that the proposed model consistently produces high-quality depth maps.
arXiv Detail & Related papers (2025-03-05T01:35:14Z) - Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions [58.88917836512819]
We propose a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints.
To mitigate the effects of poor lighting on stereo matching, we introduce Degradation Masking.
Our method achieves state-of-the-art (SOTA) performance on the Multi-Spectral Stereo (MS2) dataset.
arXiv Detail & Related papers (2024-11-06T03:30:46Z) - FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments [11.865960842220629]
This paper presents a stereo thermal depth perception dataset for autonomous aerial perception applications.
The dataset consists of stereo thermal images, LiDAR, IMU and ground truth depth maps captured in urban and forest settings.
We benchmark representative stereo depth estimation algorithms, offering insights into their performance in degraded conditions.
arXiv Detail & Related papers (2024-09-12T02:51:21Z) - DIDLM: A SLAM Dataset for Difficult Scenarios Featuring Infrared, Depth Cameras, LIDAR, 4D Radar, and Others under Adverse Weather, Low Light Conditions, and Rough Roads [20.600516423425688]
We introduce a multi-sensor dataset covering challenging scenarios such as snowy weather, rainy weather, nighttime conditions, speed bumps, and rough terrains.<n>The dataset includes rarely utilized sensors for extreme conditions, such as 4D millimeter-wave radar, infrared cameras, and depth cameras, alongside 3D LiDAR, RGB cameras, GPS, and IMU.<n>It supports both autonomous driving and ground robot applications and provides reliable GPS/INS ground truth data, covering structured and semi-structured terrains.
arXiv Detail & Related papers (2024-04-15T09:49:33Z) - Caltech Aerial RGB-Thermal Dataset in the Wild [14.699908177967181]
We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments.
Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests.
We provide semantic segmentation annotations for 10 classes commonly encountered in natural settings.
arXiv Detail & Related papers (2024-03-13T23:31:04Z) - Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Unsupervised Visible-light Images Guided Cross-Spectrum Depth Estimation
from Dual-Modality Cameras [33.77748026254935]
Cross-spectrum depth estimation aims to provide a depth map in all illumination conditions with a pair of dual-spectrum images.
In this paper, we propose an unsupervised visible-light image guided cross-spectrum (i.e., thermal and visible-light, TIR-VIS in short) depth estimation framework.
Our method achieves better performance than the compared existing methods.
arXiv Detail & Related papers (2022-04-30T12:58:35Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - Wild ToFu: Improving Range and Quality of Indirect Time-of-Flight Depth
with RGB Fusion in Challenging Environments [56.306567220448684]
We propose a new learning based end-to-end depth prediction network which takes noisy raw I-ToF signals as well as an RGB image.
We show more than 40% RMSE improvement on the final depth map compared to the baseline approach.
arXiv Detail & Related papers (2021-12-07T15:04:14Z) - Drone-based RGB-Infrared Cross-Modality Vehicle Detection via
Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image.
We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle.
Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.