Related papers: Inference-Time Scaling of Diffusion Models for Infrared Data Generation

Inference-Time Scaling of Diffusion Models for Infrared Data Generation

URL: http://arxiv.org/abs/2511.07362v1
Date: Mon, 10 Nov 2025 18:18:38 GMT
Title: Inference-Time Scaling of Diffusion Models for Infrared Data Generation
Authors: Kai A. Horstmann, Maxim Clouser, Kia Khezeli,
Abstract summary: Development of vision models for infrared applications is hindered by specialized expertise for infrared annotation.<n>We introduce an inference-time scaling approach using a domain-adapted CLIP-based verifier for enhanced infrared image generation quality.<n>We find that our approach leads to consistent improvements in generation quality, reducing FID scores on the KAIST Multispectral Pedestrian Detection Benchmark dataset by 10%.
Score: 1.452875650827562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Infrared imagery enables temperature-based scene understanding using passive sensors, particularly under conditions of low visibility where traditional RGB imaging fails. Yet, developing downstream vision models for infrared applications is hindered by the scarcity of high-quality annotated data, due to the specialized expertise required for infrared annotation. While synthetic infrared image generation has the potential to accelerate model development by providing large-scale, diverse training data, training foundation-level generative diffusion models in the infrared domain has remained elusive due to limited datasets. In light of such data constraints, we explore an inference-time scaling approach using a domain-adapted CLIP-based verifier for enhanced infrared image generation quality. We adapt FLUX.1-dev, a state-of-the-art text-to-image diffusion model, to the infrared domain by finetuning it on a small sample of infrared images using parameter-efficient techniques. The trained verifier is then employed during inference to guide the diffusion sampling process toward higher quality infrared generations that better align with input text prompts. Empirically, we find that our approach leads to consistent improvements in generation quality, reducing FID scores on the KAIST Multispectral Pedestrian Detection Benchmark dataset by 10% compared to unguided baseline samples. Our results suggest that inference-time guidance offers a promising direction for bridging the domain gap in low-data infrared settings.

Related papers

IrisNet: Infrared Image Status Awareness Meta Decoder for Infrared Small Targets Detection [92.56025546608699]
IrisNet is a novel meta-learned framework that adapts detection strategies to the input infrared image status.<n>Our approach establishes a dynamic mapping between infrared image features and entire decoder parameters.<n> Experiments on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K datasets demonstrate the superiority of our IrisNet.
arXiv Detail & Related papers (2025-11-25T13:53:54Z)
Enhancing Infrared Vision: Progressive Prompt Fusion Network and Benchmark [58.61079960074608]
Existing infrared image enhancement methods focus on tackling individual degradations.<n>All-in-one enhancement methods, commonly applied to RGB sensors, often demonstrate limited effectiveness.
arXiv Detail & Related papers (2025-10-10T12:55:54Z)
F-ViTA: Foundation Model Guided Visible to Thermal Translation [27.200043694866388]
We propose F-ViTA, a novel approach that leverages the general world knowledge embedded in foundation models to guide the diffusion process for improved translation.<n>Our model generalizes well to out-of-distribution (OOD) scenarios and can generate Long-Wave Infrared (LWIR), Mid-Wave Infrared (MWIR), and Near-Infrared (NIR) translations from the same visible image.
arXiv Detail & Related papers (2025-04-03T17:47:06Z)
Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains.<n>We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset.<n>We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z)
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution [32.53713932204663]
DifIISR is an infrared image super-resolution diffusion model optimized for visual quality and perceptual performance.<n>We introduce an infrared thermal spectrum distribution regulation to preserve visual fidelity.<n>We incorporate various visual foundational models as the perceptual guidance for downstream visual tasks.
arXiv Detail & Related papers (2025-03-03T05:20:57Z)
Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z)
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts. Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns. We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z)
PID: Physics-Informed Diffusion Model for Infrared Image Generation [11.416759828137701]
Infrared imaging technology has gained significant attention for its reliable sensing ability in low visibility conditions.<n>Most existing image translation methods treat infrared images as a stylistic variation, neglecting the underlying physical laws.<n>We propose a Physics-Informed Diffusion (PID) model for translating RGB images to infrared images that adhere to physical laws.
arXiv Detail & Related papers (2024-07-12T14:32:30Z)
Thermal-NeRF: Neural Radiance Fields from an Infrared Camera [29.58060552299745]
We introduce Thermal-NeRF, the first method that estimates a volumetric scene representation in the form of a NeRF solely from IR imaging. We conduct extensive experiments to demonstrate that Thermal-NeRF can achieve superior quality compared to existing methods.
arXiv Detail & Related papers (2024-03-15T14:27:15Z)
Denoising Diffusion Models for Plug-and-Play Image Restoration [135.6359475784627]
This paper proposes DiffPIR, which integrates the traditional plug-and-play method into the diffusion sampling framework. Compared to plug-and-play IR methods that rely on discriminative Gaussian denoisers, DiffPIR is expected to inherit the generative ability of diffusion models.
arXiv Detail & Related papers (2023-05-15T20:24:38Z)
Dif-Fusion: Towards High Color Fidelity in Infrared and Visible Image Fusion with Diffusion Models [54.952979335638204]
We propose a novel method with diffusion models, termed as Dif-Fusion, to generate the distribution of the multi-channel input data. Our method is more effective than other state-of-the-art image fusion methods, especially in color fidelity.
arXiv Detail & Related papers (2023-01-19T13:37:19Z)
Thermal Image Super-Resolution Using Second-Order Channel Attention with Varying Receptive Fields [4.991042925292453]
We introduce a system to efficiently reconstruct thermal images. The restoration of thermal images is critical for applications that involve safety, search and rescue, and military operations.
arXiv Detail & Related papers (2021-07-30T22:17:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.