InfMAE: A Foundation Model in the Infrared Modality
- URL: http://arxiv.org/abs/2402.00407v2
- Date: Sat, 14 Sep 2024 14:58:26 GMT
- Title: InfMAE: A Foundation Model in the Infrared Modality
- Authors: Fangcen Liu, Chenqiang Gao, Yaming Zhang, Junjie Guo, Jinhao Wang, Deyu Meng,
- Abstract summary: In this paper, we propose InfMAE, a foundation model in infrared modality.
We release an infrared dataset, called Inf30, to address the problem of lacking large-scale data for self-supervised learning.
We also design an information-aware masking strategy, which is suitable for infrared images.
- Score: 38.23685358198649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the foundation models have swept the computer vision field and facilitated the development of various tasks within different modalities. However, it remains an open question on how to design an infrared foundation model. In this paper, we propose InfMAE, a foundation model in infrared modality. We release an infrared dataset, called Inf30 to address the problem of lacking large-scale data for self-supervised learning in the infrared vision community. Besides, we design an information-aware masking strategy, which is suitable for infrared images. This masking strategy allows for a greater emphasis on the regions with richer information in infrared images during the self-supervised learning process, which is conducive to learning the generalized representation. In addition, we adopt a multi-scale encoder to enhance the performance of the pre-trained encoders in downstream tasks. Finally, based on the fact that infrared images do not have a lot of details and texture information, we design an infrared decoder module, which further improves the performance of downstream tasks. Extensive experiments show that our proposed method InfMAE outperforms other supervised methods and self-supervised learning methods in three downstream tasks.
Related papers
- IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection [55.554484379021524]
Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images.
We propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects.
arXiv Detail & Related papers (2024-07-10T10:17:57Z) - Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior [63.64088590653005]
We propose Diff-Mosaic, a data augmentation method based on the diffusion model.
We introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images.
In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene.
arXiv Detail & Related papers (2024-06-02T06:23:05Z) - MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [57.18758272617101]
MaeFuse is a novel autoencoder model designed for infrared and visible image fusion (IVIF)
Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks.
MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z) - VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing [13.777195433138179]
This study aims to design a visible-infrared fusion network for image dehazing.
In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module to restore more spatial and marginal information.
To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform.
arXiv Detail & Related papers (2024-04-11T14:31:11Z) - HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection [16.92362922379821]
We propose a deep learning method to improve infrared small object detection performance.
The method includes the parallelized patch-aware attention (PPA) module, dimension-aware selective integration (DASI) module, and multi-dilated channel refiner (MDCR) module.
arXiv Detail & Related papers (2024-03-16T02:45:42Z) - Fusion of Infrared and Visible Images based on Spatial-Channel
Attentional Mechanism [3.388001684915793]
We present AMFusionNet, an innovative approach to infrared and visible image fusion (IVIF)
By assimilating thermal details from infrared images with texture features from visible sources, our method produces images enriched with comprehensive information.
Our method outperforms state-of-the-art algorithms in terms of quality and quantity.
arXiv Detail & Related papers (2023-08-25T21:05:11Z) - Interactive Feature Embedding for Infrared and Visible Image Fusion [94.77188069479155]
General deep learning-based methods for infrared and visible image fusion rely on the unsupervised mechanism for vital information retention.
We propose a novel interactive feature embedding in self-supervised learning framework for infrared and visible image fusion.
arXiv Detail & Related papers (2022-11-09T13:34:42Z) - AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural
Images with Aperture Rendering Neural Radiance Fields [23.92262483956057]
Fully unsupervised 3D representation learning has gained attention owing to its advantages in data collection.
We propose an aperture rendering NeRF (AR-NeRF) which can utilize viewpoint and defocus cues in a unified manner.
We demonstrate the utility of AR-NeRF for unsupervised learning of the depth and defocus effects.
arXiv Detail & Related papers (2022-06-13T12:41:59Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z) - Domain Adversarial Training for Infrared-colour Person Re-Identification [19.852463786440122]
Person re-identification (re-ID) is a very active area of research in computer vision.
Most methods only address the task of matching between colour images.
In poorly-lit environments CCTV cameras switch to infrared imaging.
We propose a part-feature extraction network to better focus on subtle, unique signatures on the person.
arXiv Detail & Related papers (2020-03-09T15:17:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.