Related papers: InfMAE: A Foundation Model in Infrared Modality

InfMAE: A Foundation Model in Infrared Modality

URL: http://arxiv.org/abs/2402.00407v1
Date: Thu, 1 Feb 2024 08:02:10 GMT
Title: InfMAE: A Foundation Model in Infrared Modality
Authors: Fangcen Liu, Chenqiang Gao, Yaming Zhang, Junjie Guo, Jinhao Wang, Deyu Meng
Abstract summary: In this paper, we propose InfMAE, a foundation model in infrared modality. We release an infrared dataset, called Inf30, to address the problem of lacking large-scale data for self-supervised learning. Our proposed method InfMAE outperforms other supervised methods and self-supervised learning methods in three downstream tasks.
Score: 40.51637501297646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the foundation models have swept the computer vision field and facilitated the development of various tasks within different modalities. However, it remains an open question on how to design an infrared foundation model. In this paper, we propose InfMAE, a foundation model in infrared modality. We release an infrared dataset, called Inf30 to address the problem of lacking large-scale data for self-supervised learning in the infrared vision community. Besides, we design an information-aware masking strategy, which is suitable for infrared images. This masking strategy allows for a greater emphasis on the regions with richer information in infrared images during the self-supervised learning process, which is conducive to learning the generalized representation. In addition, we adopt a multi-scale encoder to enhance the performance of the pre-trained encoders in downstream tasks. Finally, based on the fact that infrared images do not have a lot of details and texture information, we design an infrared decoder module, which further improves the performance of downstream tasks. Extensive experiments show that our proposed method InfMAE outperforms other supervised methods and self-supervised learning methods in three downstream tasks. Our code will be made public at https://github.com/liufangcen/InfMAE.

Related papers

F-ViTA: Foundation Model Guided Visible to Thermal Translation [27.200043694866388]
We propose F-ViTA, a novel approach that leverages the general world knowledge embedded in foundation models to guide the diffusion process for improved translation. Our model generalizes well to out-of-distribution (OOD) scenarios and can generate Long-Wave Infrared (LWIR), Mid-Wave Infrared (MWIR), and Near-Infrared (NIR) translations from the same visible image.
arXiv Detail & Related papers (2025-04-03T17:47:06Z)
Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains. We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset. We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z)
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution [32.53713932204663]
DifIISR is an infrared image super-resolution diffusion model optimized for visual quality and perceptual performance. We introduce an infrared thermal spectrum distribution regulation to preserve visual fidelity. We incorporate various visual foundational models as the perceptual guidance for downstream visual tasks.
arXiv Detail & Related papers (2025-03-03T05:20:57Z)
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection [55.554484379021524]
Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images. We propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects.
arXiv Detail & Related papers (2024-07-10T10:17:57Z)
Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior [63.64088590653005]
We propose Diff-Mosaic, a data augmentation method based on the diffusion model. We introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images. In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene.
arXiv Detail & Related papers (2024-06-02T06:23:05Z)
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [57.18758272617101]
MaeFuse is a novel autoencoder model designed for infrared and visible image fusion (IVIF) Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks. MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z)
VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing [13.777195433138179]
This study aims to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module to restore more spatial and marginal information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform.
arXiv Detail & Related papers (2024-04-11T14:31:11Z)
HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection [16.92362922379821]
We propose a deep learning method to improve infrared small object detection performance. The method includes the parallelized patch-aware attention (PPA) module, dimension-aware selective integration (DASI) module, and multi-dilated channel refiner (MDCR) module.
arXiv Detail & Related papers (2024-03-16T02:45:42Z)
Fusion of Infrared and Visible Images based on Spatial-Channel Attentional Mechanism [3.388001684915793]
We present AMFusionNet, an innovative approach to infrared and visible image fusion (IVIF) By assimilating thermal details from infrared images with texture features from visible sources, our method produces images enriched with comprehensive information. Our method outperforms state-of-the-art algorithms in terms of quality and quantity.
arXiv Detail & Related papers (2023-08-25T21:05:11Z)
Interactive Feature Embedding for Infrared and Visible Image Fusion [94.77188069479155]
General deep learning-based methods for infrared and visible image fusion rely on the unsupervised mechanism for vital information retention. We propose a novel interactive feature embedding in self-supervised learning framework for infrared and visible image fusion.
arXiv Detail & Related papers (2022-11-09T13:34:42Z)
AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields [23.92262483956057]
Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection. We propose an aperture rendering NeRF (AR-NeRF) which can utilize viewpoint and defocus cues in a unified manner. We demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.
arXiv Detail & Related papers (2022-06-13T12:41:59Z)
Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer. We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range. We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z)
Domain Adversarial Training for Infrared-colour Person Re-Identification [19.852463786440122]
Person re-identification (re-ID) is a very active area of research in computer vision. Most methods only address the task of matching between colour images. In poorly-lit environments CCTV cameras switch to infrared imaging. We propose a part-feature extraction network to better focus on subtle, unique signatures on the person.
arXiv Detail & Related papers (2020-03-09T15:17:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.