InfMAE: A Foundation Model in Infrared Modality
- URL: http://arxiv.org/abs/2402.00407v1
- Date: Thu, 1 Feb 2024 08:02:10 GMT
- Title: InfMAE: A Foundation Model in Infrared Modality
- Authors: Fangcen Liu, Chenqiang Gao, Yaming Zhang, Junjie Guo, Jinhao Wang,
Deyu Meng
- Abstract summary: In this paper, we propose InfMAE, a foundation model in infrared modality.
We release an infrared dataset, called Inf30, to address the problem of lacking large-scale data for self-supervised learning.
Our proposed method InfMAE outperforms other supervised methods and self-supervised learning methods in three downstream tasks.
- Score: 40.51637501297646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the foundation models have swept the computer vision field
and facilitated the development of various tasks within different modalities.
However, it remains an open question on how to design an infrared foundation
model. In this paper, we propose InfMAE, a foundation model in infrared
modality. We release an infrared dataset, called Inf30 to address the problem
of lacking large-scale data for self-supervised learning in the infrared vision
community. Besides, we design an information-aware masking strategy, which is
suitable for infrared images. This masking strategy allows for a greater
emphasis on the regions with richer information in infrared images during the
self-supervised learning process, which is conducive to learning the
generalized representation. In addition, we adopt a multi-scale encoder to
enhance the performance of the pre-trained encoders in downstream tasks.
Finally, based on the fact that infrared images do not have a lot of details
and texture information, we design an infrared decoder module, which further
improves the performance of downstream tasks. Extensive experiments show that
our proposed method InfMAE outperforms other supervised methods and
self-supervised learning methods in three downstream tasks. Our code will be
made public at https://github.com/liufangcen/InfMAE.
Related papers
- IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection [55.554484379021524]
Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images.
We propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects.
arXiv Detail & Related papers (2024-07-10T10:17:57Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving.
It predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
It is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior [63.64088590653005]
We propose Diff-Mosaic, a data augmentation method based on the diffusion model.
We introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images.
In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene.
arXiv Detail & Related papers (2024-06-02T06:23:05Z) - MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [57.18758272617101]
MaeFuse is a novel autoencoder model designed for infrared and visible image fusion (IVIF)
Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks.
MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z) - VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing [13.777195433138179]
This study aims to design a visible-infrared fusion network for image dehazing.
In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module to restore more spatial and marginal information.
To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform.
arXiv Detail & Related papers (2024-04-11T14:31:11Z) - HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection [16.92362922379821]
We propose a deep learning method to improve infrared small object detection performance.
The method includes the parallelized patch-aware attention (PPA) module, dimension-aware selective integration (DASI) module, and multi-dilated channel refiner (MDCR) module.
arXiv Detail & Related papers (2024-03-16T02:45:42Z) - Fusion of Infrared and Visible Images based on Spatial-Channel
Attentional Mechanism [3.388001684915793]
We present AMFusionNet, an innovative approach to infrared and visible image fusion (IVIF)
By assimilating thermal details from infrared images with texture features from visible sources, our method produces images enriched with comprehensive information.
Our method outperforms state-of-the-art algorithms in terms of quality and quantity.
arXiv Detail & Related papers (2023-08-25T21:05:11Z) - Infrared Small-Dim Target Detection with Transformer under Complex
Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer.
We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range.
We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z) - Domain Adversarial Training for Infrared-colour Person Re-Identification [19.852463786440122]
Person re-identification (re-ID) is a very active area of research in computer vision.
Most methods only address the task of matching between colour images.
In poorly-lit environments CCTV cameras switch to infrared imaging.
We propose a part-feature extraction network to better focus on subtle, unique signatures on the person.
arXiv Detail & Related papers (2020-03-09T15:17:15Z) - Infrared and 3D skeleton feature fusion for RGB-D action recognition [0.30458514384586394]
We propose a modular network combining skeleton and infrared data.
A 2D convolutional network (CNN) is used as a pose module to extract features from skeleton data.
A 3D CNN is used as an infrared module to extract visual cues from videos.
arXiv Detail & Related papers (2020-02-28T17:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.