CART: Caltech Aerial RGB-Thermal Dataset in the Wild
- URL: http://arxiv.org/abs/2403.08997v1
- Date: Wed, 13 Mar 2024 23:31:04 GMT
- Title: CART: Caltech Aerial RGB-Thermal Dataset in the Wild
- Authors: Connor Lee, Matthew Anderson, Nikhil Raganathan, Xingxing Zuo, Kevin Do, Georgia Gkioxari, Soon-Jo Chung,
- Abstract summary: We present the first publicly available RGB-thermal dataset designed for aerial robotics operating in natural environments.
Our dataset captures a variety of terrains across the continental United States, including rivers, lakes, coastlines, deserts, and forests.
We propose new and challenging benchmarks for thermal and RGB-thermal semantic segmentation, RGB-to-thermal image translation, and visual-inertial odometry.
- Score: 14.699908177967181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the first publicly available RGB-thermal dataset designed for aerial robotics operating in natural environments. Our dataset captures a variety of terrains across the continental United States, including rivers, lakes, coastlines, deserts, and forests, and consists of synchronized RGB, long-wave thermal, global positioning, and inertial data. Furthermore, we provide semantic segmentation annotations for 10 classes commonly encountered in natural settings in order to facilitate the development of perception algorithms robust to adverse weather and nighttime conditions. Using this dataset, we propose new and challenging benchmarks for thermal and RGB-thermal semantic segmentation, RGB-to-thermal image translation, and visual-inertial odometry. We present extensive results using state-of-the-art methods and highlight the challenges posed by temporal and geographical domain shifts in our data. Dataset and accompanying code will be provided at https://github.com/aerorobotics/caltech-aerial-rgbt-dataset
Related papers
- CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark [4.463254896517738]
CattleFace-RGBT is a RGB-T Cattle Facial Landmark dataset consisting of 2,300 RGB-T image pairs, a total of 4,600 images.
Applying AI to thermal images is challenging due to suboptimal results from direct thermal training and infeasible RGB-thermal alignment.
We transfer models trained on RGB to thermal images and refine them using our AI-assisted annotation tool.
arXiv Detail & Related papers (2024-06-05T16:29:13Z) - RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications [55.24463002889]
We focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim)
In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors.
RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks.
arXiv Detail & Related papers (2024-04-05T08:52:32Z) - Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing
Trimodal Data [1.8024397171920885]
We introduce a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets.
This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically.
By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.
arXiv Detail & Related papers (2024-02-02T16:27:45Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - Temporal Graph Benchmark for Machine Learning on Temporal Graphs [54.52243310226456]
Temporal Graph Benchmark (TGB) is a collection of challenging and diverse benchmark datasets.
We benchmark each dataset and find that the performance of common models can vary drastically across datasets.
TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research.
arXiv Detail & Related papers (2023-07-03T13:58:20Z) - THRawS: A Novel Dataset for Thermal Hotspots Detection in Raw Sentinel-2
Data [4.077787659104315]
THRawS is the first dataset composed of Sentinel-2 (S-2) raw data containing warm temperature hotspots.
To foster the realisation of robust AI architectures, the dataset gathers data from all over the globe.
arXiv Detail & Related papers (2023-05-12T09:54:21Z) - Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using
Meta-Learning [64.92447072894055]
Infrared (IR) cameras are robust under adverse illumination and lighting conditions.
We propose an algorithm meta-learning framework to improve existing UDA methods.
We produce a state-of-the-art thermal detector for the KAIST and DSIAC datasets.
arXiv Detail & Related papers (2021-10-07T02:28:18Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z) - HeatNet: Bridging the Day-Night Domain Gap in Semantic Segmentation with
Thermal Images [26.749261270690425]
Real-world driving scenarios entail adverse environmental conditions such as nighttime illumination or glare.
We propose a multimodal semantic segmentation model that can be applied during daytime and nighttime.
Besides RGB images, we leverage thermal images, making our network significantly more robust.
arXiv Detail & Related papers (2020-03-10T11:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.