DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance
- URL: http://arxiv.org/abs/2512.04511v1
- Date: Thu, 04 Dec 2025 06:45:20 GMT
- Title: DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance
- Authors: Yinghui Xing, Xiaoting Su, Shizhou Zhang, Donghao Chu, Di Xu,
- Abstract summary: In this paper, we propose a Dual-domain Guided Infrared foundation model based on MAE (DuGI-MAE)<n>First, we design a deterministic masking strategy based on token entropy, preserving only high-entropy tokens for reconstruction to enhance informativeness.<n>Next, we introduce a Dual-Domain Guidance (DDG) module, which simultaneously captures global token relationships and adaptively filters non-uniform background noise commonly present in infrared imagery.<n>Pretrained on Inf-590K, DuGI-MAE demonstrates strong generalization capabilities across various downstream tasks, including infrared object detection, semantic segmentation, and small target detection
- Score: 20.484726951373602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked Autoencoder (MAE) trained on visible data perform suboptimal in infrared image interpretation tasks. To bridge this gap, an infrared foundation model known as InfMAE was developed and pre-trained on large-scale infrared datasets. Despite its effectiveness, InfMAE still faces several limitations, including the omission of informative tokens, insufficient modeling of global associations, and neglect of non-uniform noise. In this paper, we propose a Dual-domain Guided Infrared foundation model based on MAE (DuGI-MAE). First, we design a deterministic masking strategy based on token entropy, preserving only high-entropy tokens for reconstruction to enhance informativeness. Next, we introduce a Dual-Domain Guidance (DDG) module, which simultaneously captures global token relationships and adaptively filters non-uniform background noise commonly present in infrared imagery. To facilitate large-scale pretraining, we construct Inf-590K, a comprehensive infrared image dataset encompassing diverse scenes, various target types, and multiple spatial resolutions. Pretrained on Inf-590K, DuGI-MAE demonstrates strong generalization capabilities across various downstream tasks, including infrared object detection, semantic segmentation, and small target detection. Experimental results validate the superiority of the proposed method over both supervised and self-supervised comparison methods. Our code is available in the supplementary material.
Related papers
- IrisNet: Infrared Image Status Awareness Meta Decoder for Infrared Small Targets Detection [92.56025546608699]
IrisNet is a novel meta-learned framework that adapts detection strategies to the input infrared image status.<n>Our approach establishes a dynamic mapping between infrared image features and entire decoder parameters.<n> Experiments on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K datasets demonstrate the superiority of our IrisNet.
arXiv Detail & Related papers (2025-11-25T13:53:54Z) - SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion [65.80051636480836]
This paper proposes a conditional diffusion model guided by the Segment Anything Model (SAM) to achieve high-fidelity and semantically-aware image fusion.<n>The framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks as a condition to drive the diffusion model's coarse-to-fine denoising generation.<n>Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations.
arXiv Detail & Related papers (2025-08-07T10:58:52Z) - MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization [26.33768545616346]
Existing colorization methods rely on single-band images with limited spectral information and insufficient feature extraction capabilities.<n>In this paper, we propose a generative adversarial network (GAN)-based framework designed to integrate spectral information to enhance the colorization of infrared images.<n> Experimental results demonstrate that the proposed method significantly outperforms traditional techniques and effectively enhances the visual quality of infrared images.
arXiv Detail & Related papers (2025-06-21T01:42:25Z) - Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains.<n>We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset.<n>We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z) - DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution [32.53713932204663]
DifIISR is an infrared image super-resolution diffusion model optimized for visual quality and perceptual performance.<n>We introduce an infrared thermal spectrum distribution regulation to preserve visual fidelity.<n>We incorporate various visual foundational models as the perceptual guidance for downstream visual tasks.
arXiv Detail & Related papers (2025-03-03T05:20:57Z) - IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection [55.554484379021524]
Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images.
We propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects.
arXiv Detail & Related papers (2024-07-10T10:17:57Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - InfMAE: A Foundation Model in the Infrared Modality [38.23685358198649]
In this paper, we propose InfMAE, a foundation model in infrared modality.
We release an infrared dataset, called Inf30, to address the problem of lacking large-scale data for self-supervised learning.
We also design an information-aware masking strategy, which is suitable for infrared images.
arXiv Detail & Related papers (2024-02-01T08:02:10Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.