ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement
- URL: http://arxiv.org/abs/2504.07418v1
- Date: Thu, 10 Apr 2025 03:24:21 GMT
- Title: ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement
- Authors: Anning Hu, Ang Li, Xirui Jin, Danping Zou,
- Abstract summary: We introduce ThermoStereoRT, a real-time thermal stereo matching method.<n>It recovers disparity from two rectified thermal stereo images.<n>We envision applications such as night-time drone surveillance or under-bed cleaning robots.
- Score: 9.923805440410739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments. Our code will be released on https://github.com/SJTU-ViSYS-team/ThermoStereoRT
Related papers
- Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping [8.401699100150866]
This paper presents a novel method for semantically enhancing 3D point cloud maps with thermal information.<n>The system projects real-time LiDAR point clouds onto this fused image stream.<n>It then segments heat source features in the thermal channel to instantly identify high temperature targets and applies this temperature information as a semantic layer on the final 3D map.
arXiv Detail & Related papers (2026-01-14T15:46:57Z) - 3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion [10.271921954105805]
3M-TI is a calibration-free Multi-camera cross-Modality diffusion framework for Mobile Thermal Imaging.<n>At its core, 3M-TI integrates a cross-modal self-attention module (CSM) into the diffusion UNet.<n>This design enables the diffusion network to leverage its generative prior to enhance spatial resolution, structural fidelity, and texture detail in the super-resolved thermal images.
arXiv Detail & Related papers (2025-11-24T13:48:47Z) - Fast 3D Surrogate Modeling for Data Center Thermal Management [15.644716872105002]
Traditional thermal CFD solvers are computationally expensive and require expert-crafted meshes and boundary conditions.<n>We develop a vision-based surrogate modeling framework that operates directly on a 3D voxelized representation of the data center.<n>Our results show that the surrogate models generalize across data center configurations and achieve up to 20,000x speedup.
arXiv Detail & Related papers (2025-11-13T02:12:24Z) - 3D Reconstruction from Transient Measurements with Time-Resolved Transformer [48.73999376279579]
We propose a generic Time-Resolved Transformer (TRT) architecture to boost 3D reconstruction performance in photon-efficient imaging.<n>In this paper, we develop two task-specific embodiments: TRT-LOS for imaging and TRT-NLOS for NLOS imaging.<n>In addition, we contribute a large-scale, high-resolution synthetic LOS dataset with various noise levels and capture a set of real-world NLOS imaging measurements.
arXiv Detail & Related papers (2025-10-10T09:44:08Z) - ThermalDiffusion: Visual-to-Thermal Image-to-Image Translation for Autonomous Navigation [6.524847658755803]
We propose a solution to augment multi-modal datasets with synthetic thermal data to enable widespread and rapid adaptation of thermal cameras.<n>We explore the use of conditional diffusion models to convert existing RGB images to thermal images using self-attention to learn the thermal properties of real-world objects.
arXiv Detail & Related papers (2025-06-26T03:18:22Z) - Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis [3.1457219084519004]
3D Gaussian Splatting (3D-GS) based on Thermal Infrared (TIR) imaging has gained attention in novel-view synthesis.<n>We introduce Veta-GS, which leverages a view-dependent deformation field and a Thermal Feature Extractor to capture subtle thermal variations.<n>Our method achieves better performance over existing methods.
arXiv Detail & Related papers (2025-05-25T13:20:45Z) - S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.
It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.
We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z) - Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges [5.946838062187346]
This manuscript provides a large-scale Multi-Spectral Stereo (MS$2$) dataset that consists of stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and semi-dense depth ground truth.
MS$2$ dataset includes 162K synchronized multi-modal data pairs captured across diverse locations.
arXiv Detail & Related papers (2025-03-28T00:46:55Z) - FLAME 3 Dataset: Unleashing the Power of Radiometric Thermal UAV Imagery for Wildfire Management [3.3686755167352223]
FLAME 3 is the first comprehensive collection of side-by-side visual spectrum and radiometric thermal imagery of wildland fires.<n>This dataset aims to spur a new generation of machine learning models utilizing radiometric thermal imagery.
arXiv Detail & Related papers (2024-12-03T20:53:42Z) - RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model [59.37279559684668]
We introduce RS-vHeat, an efficient multi-modal remote sensing foundation model.<n>Specifically, RS-vHeat applies the Heat Conduction Operator (HCO) with a complexity of $O(N1.5)$ and a global receptive field.<n>Compared to attention-based remote sensing foundation models, we reduce memory usage by 84%, FLOPs by 24% and improves throughput by 2.7 times.
arXiv Detail & Related papers (2024-11-27T01:43:38Z) - Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity [0.6249768559720122]
Multiple Object Tracking (MOT) in thermal imaging presents unique challenges due to the lack of visual features and the complexity of motion patterns.
This paper introduces an innovative approach to improve MOT in the thermal domain by developing a novel box association method.
Our method merges thermal feature sparsity and dynamic object tracking, enabling more accurate and robust MOT performance.
arXiv Detail & Related papers (2024-11-20T00:27:01Z) - ThermalGaussian: Thermal 3D Gaussian Splatting [25.536611434289647]
We propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities.
We conduct comprehensive experiments to show that ThermalGaussian achieves photorealistic rendering of thermal images and improves the rendering quality of RGB images.
arXiv Detail & Related papers (2024-09-11T11:45:57Z) - ThermalNeRF: Thermal Radiance Fields [32.881758519242155]
We propose a unified framework for scene reconstruction from a set of LWIR and RGB images.
We calibrate the RGB and infrared cameras with respect to each other, as a preprocessing step.
We show that our method is capable of thermal super-resolution, as well as visually removing obstacles to reveal objects occluded in either the RGB or thermal channels.
arXiv Detail & Related papers (2024-07-22T02:51:29Z) - vHeat: Building Vision Models upon Heat Conduction [63.00030330898876]
vHeat is a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field.
The essential idea is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy.
arXiv Detail & Related papers (2024-05-26T12:58:04Z) - ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields [5.66229031510643]
Thermal scene reconstruction holds great potential for various applications, such as analyzing building energy consumption and performing non-destructive infrastructure testing.<n>Existing methods typically require dense scene measurements and often rely on RGB images for 3D geometry reconstruction, projecting thermal information post-reconstruction.<n>We propose ThermoNeRF, a novel approach based on Neural Radiance Fields that jointly renders new RGB and thermal views of a scene, and ThermoScenes, a dataset of paired RGB+thermal images comprising 8 scenes of building facades and 8 scenes of everyday objects.
arXiv Detail & Related papers (2024-03-18T18:10:34Z) - Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z) - Spatiotemporal Attention-based Semantic Compression for Real-time Video
Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame.
We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information.
Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset [62.193924313292875]
We present the DEVCOM Army Research Laboratory Visible-Thermal Face dataset (ARL-VTF)
With over 500,000 images from 395 subjects, the ARL-VTF dataset represents to the best of our knowledge, the largest collection of paired visible and thermal face images to date.
This paper presents benchmark results and analysis on thermal face landmark detection and thermal-to-visible face verification by evaluating state-of-the-art models on the ARL-VTF dataset.
arXiv Detail & Related papers (2021-01-07T17:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.