Related papers: 3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion

3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion

URL: http://arxiv.org/abs/2511.19117v1
Date: Mon, 24 Nov 2025 13:48:47 GMT
Title: 3M-TI: High-Quality Mobile Thermal Imaging via Calibration-free Multi-Camera Cross-Modal Diffusion
Authors: Minchong Chen, Xiaoyun Yuan, Junzhe Wan, Jianing Zhang, Jun Zhang,
Abstract summary: 3M-TI is a calibration-free Multi-camera cross-Modality diffusion framework for Mobile Thermal Imaging.<n>At its core, 3M-TI integrates a cross-modal self-attention module (CSM) into the diffusion UNet.<n>This design enables the diffusion network to leverage its generative prior to enhance spatial resolution, structural fidelity, and texture detail in the super-resolved thermal images.
Score: 10.271921954105805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The miniaturization of thermal sensors for mobile platforms inherently limits their spatial resolution and textural fidelity, leading to blurry and less informative images. Existing thermal super-resolution (SR) methods can be grouped into single-image and RGB-guided approaches: the former struggles to recover fine structures from limited information, while the latter relies on accurate and laborious cross-camera calibration, which hinders practical deployment and robustness. Here, we propose 3M-TI, a calibration-free Multi-camera cross-Modality diffusion framework for Mobile Thermal Imaging. At its core, 3M-TI integrates a cross-modal self-attention module (CSM) into the diffusion UNet, replacing the original self-attention layers to adaptively align thermal and RGB features throughout the denoising process, without requiring explicit camera calibration. This design enables the diffusion network to leverage its generative prior to enhance spatial resolution, structural fidelity, and texture detail in the super-resolved thermal images. Extensive evaluations on real-world mobile thermal cameras and public benchmarks validate our superior performance, achieving state-of-the-art results in both visual quality and quantitative metrics. More importantly, the thermal images enhanced by 3M-TI lead to substantial gains in critical downstream tasks like object detection and segmentation, underscoring its practical value for robust mobile thermal perception systems. More materials: https://github.com/work-submit/3MTI.

Related papers

Physics-Constrained Cross-Resolution Enhancement Network for Optics-Guided Thermal UAV Image Super-Resolution [19.782662020080522]
Existing methods compress optical features to match thermal feature dimensions for cross-modal alignment and fusion.<n>We propose PCNet to achieve cross-resolution mutual enhancement between optical and thermal modalities.<n>We also propose a Physics-Driven Thermal Conduction Module (PDTM) that incorporates two-dimensional heat conduction into optical guidance.
arXiv Detail & Related papers (2026-01-07T02:30:27Z)
Enhancing Infrared Vision: Progressive Prompt Fusion Network and Benchmark [58.61079960074608]
Existing infrared image enhancement methods focus on tackling individual degradations.<n>All-in-one enhancement methods, commonly applied to RGB sensors, often demonstrate limited effectiveness.
arXiv Detail & Related papers (2025-10-10T12:55:54Z)
TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection [58.00308680221481]
Infrared small target detection (ISTD) is critical to national security and has been extensively applied in military areas. Most ISTD networks focus on designing feature extraction blocks or feature fusion modules, but rarely describe the ISTD process from the feature map evolution perspective. We propose Thermal Conduction-Inspired Transformer (TCI-Former) based on the theoretical principles of thermal conduction.
arXiv Detail & Related papers (2024-02-03T05:51:22Z)
Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone. Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
Photometric Correction for Infrared Sensors [1.170732359523702]
This article proposes a photometric correction model for infrared sensors based on temperature constancy. Experiments show that the reconstruction quality from the corrected infrared imagery achieves performance on par with state-of-the-art reconstruction using RGB sensors.
arXiv Detail & Related papers (2023-04-08T06:32:57Z)
CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution [0.0]
Super-Resolution approaches can replicate accurate high-resolution thermal pictures using measurements from low-cost, low-resolution thermal sensors. We present a novel data fusion framework and regularization technique for Guided Super Resolution of Thermal images.
arXiv Detail & Related papers (2023-04-03T14:21:41Z)
Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task. In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image. On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Boosting Image Super-Resolution Via Fusion of Complementary Information Captured by Multi-Modal Sensors [21.264746234523678]
Image Super-Resolution (SR) provides a promising technique to enhance the image quality of low-resolution optical sensors. In this paper, we attempt to leverage complementary information from a low-cost channel (visible/depth) to boost image quality of an expensive channel (thermal) using fewer parameters.
arXiv Detail & Related papers (2020-12-07T02:15:28Z)
Exploring Thermal Images for Object Detection in Underexposure Regions for Autonomous Driving [67.69430435482127]
Underexposure regions are vital to construct a complete perception of the surroundings for safe autonomous driving. The availability of thermal cameras has provided an essential alternate to explore regions where other optical sensors lack in capturing interpretable signals. This work proposes a domain adaptation framework which employs a style transfer technique for transfer learning from visible spectrum images to thermal images.
arXiv Detail & Related papers (2020-06-01T09:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.