Related papers: Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model

Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model

URL: http://arxiv.org/abs/2512.21032v1
Date: Wed, 24 Dec 2025 07:55:54 GMT
Title: Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model
Authors: Mingshu Cai, Osamu Yoshie, Yuya Ieiri,
Abstract summary: This paper introduces a novel latent diffusion-based model designed to generate high-quality visible face images from thermal inputs.<n>A multi-attribute classifier is incorporated to extract key facial attributes from visible images, mitigating feature loss during infrared-to-visible image restoration.
Score: 3.995408039775796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern surveillance systems increasingly rely on multi-wavelength sensors and deep neural networks to recognize faces in infrared images captured at night. However, most facial recognition models are trained on visible light datasets, leading to substantial performance degradation on infrared inputs due to significant domain shifts. Early feature-based methods for infrared face recognition proved ineffective, prompting researchers to adopt generative approaches that convert infrared images into visible light images for improved recognition. This paradigm, known as Heterogeneous Face Recognition (HFR), faces challenges such as model and modality discrepancies, leading to distortion and feature loss in generated images. To address these limitations, this paper introduces a novel latent diffusion-based model designed to generate high-quality visible face images from thermal inputs while preserving critical identity features. A multi-attribute classifier is incorporated to extract key facial attributes from visible images, mitigating feature loss during infrared-to-visible image restoration. Additionally, we propose the Self-attn Mamba module, which enhances global modeling of cross-modal features and significantly improves inference speed. Experimental results on two benchmark datasets demonstrate the superiority of our approach, achieving state-of-the-art performance in both image quality and identity preservation.

Related papers

Enhancing Infrared Vision: Progressive Prompt Fusion Network and Benchmark [58.61079960074608]
Existing infrared image enhancement methods focus on tackling individual degradations.<n>All-in-one enhancement methods, commonly applied to RGB sensors, often demonstrate limited effectiveness.
arXiv Detail & Related papers (2025-10-10T12:55:54Z)
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution [32.53713932204663]
DifIISR is an infrared image super-resolution diffusion model optimized for visual quality and perceptual performance.<n>We introduce an infrared thermal spectrum distribution regulation to preserve visual fidelity.<n>We incorporate various visual foundational models as the perceptual guidance for downstream visual tasks.
arXiv Detail & Related papers (2025-03-03T05:20:57Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts. Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns. We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z)
A Bidirectional Conversion Network for Cross-Spectral Face Recognition [1.9766522384767227]
Cross-spectral face recognition is challenging due to the dramatic difference between the visible light and IR imageries. This paper proposes a framework of bidirectional cross-spectral conversion (BCSC-GAN) between the heterogeneous face images. The network reduces the cross-spectral recognition problem into an intra-spectral problem, and improves performance by fusing bidirectional information.
arXiv Detail & Related papers (2022-05-03T16:20:10Z)
Towards Homogeneous Modality Learning and Multi-Granularity Information Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views. Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data. In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z)
A Synthesis-Based Approach for Thermal-to-Visible Face Verification [105.63410428506536]
This paper presents an algorithm that achieves state-of-the-art performance on the ARL-VTF and TUFTS multi-spectral face datasets. We also present MILAB-VTF(B), a challenging multi-spectral face dataset composed of paired thermal and visible videos.
arXiv Detail & Related papers (2021-08-21T17:59:56Z)
Simultaneous Face Hallucination and Translation for Thermal to Visible Face Verification using Axial-GAN [74.22129648654783]
We introduce the task of thermal-to-visible face verification from low-resolution thermal images. We propose Axial-Generative Adversarial Network (Axial-GAN) to synthesize high-resolution visible images for matching.
arXiv Detail & Related papers (2021-04-13T22:34:28Z)
HyperFaceNet: A Hyperspectral Face Recognition Method Based on Deep Fusion [0.7734726150561088]
How to fuse different light bands, i.e., hyperspectral face recognition, is still an open research problem. We propose a new fusion model (termed HyperFaceNet) especially for hyperspectral faces. Our method is proved to be of higher recognition rates than face recognition using either visible light or the infrared.
arXiv Detail & Related papers (2020-08-02T14:59:24Z)
Multi-Scale Thermal to Visible Face Verification via Attribute Guided Synthesis [55.29770222566124]
We use attributes extracted from visible images to synthesize attribute-preserved visible images from thermal imagery for cross-modal matching. A novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. A pre-trained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification.
arXiv Detail & Related papers (2020-04-20T01:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.