GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion
- URL: http://arxiv.org/abs/2506.14384v1
- Date: Tue, 17 Jun 2025 10:32:05 GMT
- Title: GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion
- Authors: Huan Kang, Hui Li, Xiao-Jun Wu, Tianyang Xu, Rui Wang, Chunyang Cheng, Josef Kittler,
- Abstract summary: We propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion.<n>Our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold.<n>This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank)
- Score: 33.925249998725896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces. However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannot encapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraic similarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance. While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. To address this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion (GrFormer). Specifically, our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semantic fusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based on a covariance mask to maximise the complementary properties between different modalities and to suppress the features with high correlation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods both qualitatively and quantitatively on multiple image fusion benchmarks. The codes are available at https://github.com/Shaoyun2023.
Related papers
- Manifold-aware Representation Learning for Degradation-agnostic Image Restoration [135.90908995927194]
Image Restoration (IR) aims to recover high quality images from degraded inputs affected by various corruptions such as noise, blur, haze, rain, and low light conditions.<n>We present MIRAGE, a unified framework for all in one IR that explicitly decomposes the input feature space into three semantically aligned parallel branches.<n>This modular decomposition significantly improves generalization and efficiency across diverse degradations.
arXiv Detail & Related papers (2025-05-24T12:52:10Z) - A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification [5.381099682416992]
Multispectral (MS) and panchromatic (PAN) images describe the same land surface.<n>In order to separate these similar information and their respective advantages, reduce the feature redundancy in the fusion stage.<n>This paper introduces a diff-attention aware state space fusion model (DAS2F-Model) for multimodal remote sensing image classification.
arXiv Detail & Related papers (2025-04-23T12:34:32Z) - A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
This paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation.<n>We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.<n> Experiments demonstrate the competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z) - MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion [4.788349093716269]
Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space.
The existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality.
In this study, we analyzed the spatial distribution differences of information in different modalities and proved that encoding features within the same network is not conducive to achieving simultaneous deep feature space alignment.
arXiv Detail & Related papers (2024-04-27T01:35:21Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion [144.9653045465908]
We propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM)
Our approach yields promising fusion results in infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2023-03-13T04:06:42Z) - CoCoNet: Coupled Contrastive Learning Network with Multi-level Feature Ensemble for Multi-modality Image Fusion [68.78897015832113]
We propose a coupled contrastive learning network, dubbed CoCoNet, to realize infrared and visible image fusion.<n>Our method achieves state-of-the-art (SOTA) performance under both subjective and objective evaluation.
arXiv Detail & Related papers (2022-11-20T12:02:07Z) - Subspace-Based Feature Fusion From Hyperspectral And Multispectral Image
For Land Cover Classification [17.705966155216945]
A feature fusion method from hyperspectral (HS) and multispectral (MS) images for pixel-based classification is proposed.
The proposed method first extracts spatial features from the MS image using morphological profiles.
An algorithm based on combining alternating optimization (AO) and the alternating direction method of multipliers (ADMM) is developed to solve efficiently the feature fusion problem.
arXiv Detail & Related papers (2021-02-22T17:59:18Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.