MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization
- URL: http://arxiv.org/abs/2506.17540v1
- Date: Sat, 21 Jun 2025 01:42:25 GMT
- Title: MTSIC: Multi-stage Transformer-based GAN for Spectral Infrared Image Colorization
- Authors: Tingting Liu, Yuan Liu, Jinhui Tang, Liyin Yuan, Chengyu Liu, Chunlai Li, Xiubao Sui, Qian Chen,
- Abstract summary: Existing colorization methods rely on single-band images with limited spectral information and insufficient feature extraction capabilities.<n>In this paper, we propose a generative adversarial network (GAN)-based framework designed to integrate spectral information to enhance the colorization of infrared images.<n> Experimental results demonstrate that the proposed method significantly outperforms traditional techniques and effectively enhances the visual quality of infrared images.
- Score: 26.33768545616346
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Thermal infrared (TIR) images, acquired through thermal radiation imaging, are unaffected by variations in lighting conditions and atmospheric haze. However, TIR images inherently lack color and texture information, limiting downstream tasks and potentially causing visual fatigue. Existing colorization methods primarily rely on single-band images with limited spectral information and insufficient feature extraction capabilities, which often result in image distortion and semantic ambiguity. In contrast, multiband infrared imagery provides richer spectral data, facilitating the preservation of finer details and enhancing semantic accuracy. In this paper, we propose a generative adversarial network (GAN)-based framework designed to integrate spectral information to enhance the colorization of infrared images. The framework employs a multi-stage spectral self-attention Transformer network (MTSIC) as the generator. Each spectral feature is treated as a token for self-attention computation, and a multi-head self-attention mechanism forms a spatial-spectral attention residual block (SARB), achieving multi-band feature mapping and reducing semantic confusion. Multiple SARB units are integrated into a Transformer-based single-stage network (STformer), which uses a U-shaped architecture to extract contextual information, combined with multi-scale wavelet blocks (MSWB) to align semantic information in the spatial-frequency dual domain. Multiple STformer modules are cascaded to form MTSIC, progressively optimizing the reconstruction quality. Experimental results demonstrate that the proposed method significantly outperforms traditional techniques and effectively enhances the visual quality of infrared images.
Related papers
- Infrared and Visible Image Fusion Based on Implicit Neural Representations [3.8530055385287403]
Infrared and visible light image fusion aims to combine the strengths of both modalities to generate images that are rich in information.<n>This paper proposes an image fusion method based on Implicit Neural Representations (INR), referred to as INRFuse.<n> Experimental results indicate that INRFuse outperforms existing methods in both subjective visual quality and objective evaluation metrics.
arXiv Detail & Related papers (2025-06-20T06:34:19Z) - DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z) - Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts.
Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns.
We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z) - GAN-HA: A generative adversarial network with a novel heterogeneous dual-discriminator network and a new attention-based fusion strategy for infrared and visible image fusion [0.1160897408844138]
Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images.
Existing dual-discriminator generative adversarial networks (GANs) often rely on two structurally identical discriminators for learning.
This paper proposes a novel GAN with a heterogeneous dual-discriminator network and an attention-based fusion strategy.
arXiv Detail & Related papers (2024-04-24T17:06:52Z) - Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model [0.6817102408452475]
In computer vision, visible light images often exhibit low contrast in low-light conditions, presenting a significant challenge.
Recent advancements in deep learning, particularly the deployment of Generative Adversarial Networks (GANs), have facilitated the transformation of visible light images to infrared images.
We propose a novel end-to-end Transformer-based model that efficiently converts visible light images into high-fidelity infrared images.
arXiv Detail & Related papers (2024-04-10T15:02:26Z) - Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification [79.9402521412239]
We propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective.
Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) and the Phrase-Preserving Normalization (PPNorm)
arXiv Detail & Related papers (2024-01-03T17:11:27Z) - RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network
for Underwater Image Enhancement [5.3240763486073055]
We propose a Real-time Spatial and Frequency Domains Modulation Network (RSFDM-Net) for the efficient enhancement of colors and details in underwater images.
Our proposed conditional network is designed with Adaptive Fourier Gating Mechanism (AFGM) and Multiscale Conal Attention Module (MCAM)
To more precisely correct the color cast and low saturation of the image, we introduce a Three-branch Feature Extraction (TFE) block in the primary net.
arXiv Detail & Related papers (2023-02-23T17:27:05Z) - PC-GANs: Progressive Compensation Generative Adversarial Networks for
Pan-sharpening [50.943080184828524]
We propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information.
The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously.
arXiv Detail & Related papers (2022-07-29T03:09:21Z) - Unsupervised Misaligned Infrared and Visible Image Fusion via
Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion.
To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z) - Underwater Image Enhancement via Medium Transmission-Guided Multi-Color
Space Embedding [88.46682991985907]
We present an underwater image enhancement network via medium transmission-guided multi-color space embedding, called Ucolor.
Our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding.
arXiv Detail & Related papers (2021-04-27T07:35:30Z) - SFANet: A Spectrum-aware Feature Augmentation Network for
Visible-Infrared Person Re-Identification [12.566284647658053]
We propose a novel spectrum-aware feature augementation network named SFANet for cross-modality matching problem.
Learning with grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations.
In feature-level, we improve the conventional two-stream network through balancing the number of specific and sharable convolutional blocks.
arXiv Detail & Related papers (2021-02-24T08:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.