SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds
- URL: http://arxiv.org/abs/2411.10679v1
- Date: Sat, 16 Nov 2024 03:09:49 GMT
- Title: SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds
- Authors: Huan Kang, Hui Li, Tianyang Xu, Rui Wang, Xiao-Jun Wu, Josef Kittler,
- Abstract summary: We propose a novel SPD (symmetric positive definite) manifold learning framework for multi-modal image fusion.
Our framework exhibits superior performance compared to the current state-of-the-art methods.
- Score: 35.03742076163911
- License:
- Abstract: Euclidean representation learning methods have achieved commendable results in image fusion tasks, which can be attributed to their clear advantages in handling with linear space. However, data collected from a realistic scene usually have a non-Euclidean structure, where Euclidean metric might be limited in representing the true data relationships, degrading fusion performance. To address this issue, a novel SPD (symmetric positive definite) manifold learning framework is proposed for multi-modal image fusion, named SPDFusion, which extends the image fusion approach from the Euclidean space to the SPD manifolds. Specifically, we encode images according to the Riemannian geometry to exploit their intrinsic statistical correlations, thereby aligning with human visual perception. Actually, the SPD matrix underpins our network learning, with a cross-modal fusion strategy employed to harness modality-specific dependencies and augment complementary information. Subsequently, an attention module is designed to process the learned weight matrix, facilitating the weighting of spatial global correlation semantics via SPD matrix multiplication. Based on this, we design an end-to-end fusion network based on cross-modal manifold learning. Extensive experiments on public datasets demonstrate that our framework exhibits superior performance compared to the current state-of-the-art methods.
Related papers
- Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion [25.140475569677758]
Multimodal image fusion aims to integrate information from different modalities to obtain a comprehensive image.
Existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies.
This paper dissects the significant differences between the two tasks regarding fusion goals, statistical properties, and data distribution.
arXiv Detail & Related papers (2024-11-15T08:36:24Z) - MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion [4.788349093716269]
Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space.
The existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality.
In this study, we analyzed the spatial distribution differences of information in different modalities and proved that encoding features within the same network is not conducive to achieving simultaneous deep feature space alignment.
arXiv Detail & Related papers (2024-04-27T01:35:21Z) - AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera
Joint Synthesis [98.3959800235485]
Recently, there exist some methods exploring multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance.
In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors.
We introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI)
arXiv Detail & Related papers (2024-02-27T13:08:47Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Riemannian Self-Attention Mechanism for SPD Networks [34.794770395408335]
An SPD manifold self-attention mechanism (SMSA) is proposed in this paper.
An SMSA-based geometric learning module (SMSA-GL) is designed for the sake of improving the discrimination of structured representations.
arXiv Detail & Related papers (2023-11-28T12:34:46Z) - LRRNet: A Novel Representation Learning Guided Fusion Network for
Infrared and Visible Images [98.36300655482196]
We formulate the fusion task mathematically, and establish a connection between its optimal solution and the network architecture that can implement it.
In particular we adopt a learnable representation approach to the fusion task, in which the construction of the fusion network architecture is guided by the optimisation algorithm producing the learnable model.
Based on this novel network architecture, an end-to-end lightweight fusion network is constructed to fuse infrared and visible light images.
arXiv Detail & Related papers (2023-04-11T12:11:23Z) - Adaptive Log-Euclidean Metrics for SPD Matrix Learning [73.12655932115881]
We propose Adaptive Log-Euclidean Metrics (ALEMs), which extend the widely used Log-Euclidean Metric (LEM)
The experimental and theoretical results demonstrate the merit of the proposed metrics in improving the performance of SPD neural networks.
arXiv Detail & Related papers (2023-03-26T18:31:52Z) - Voxel Field Fusion for 3D Object Detection [140.6941303279114]
We present a conceptually simple framework for cross-modality 3D object detection, named voxel field fusion.
The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field.
The framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets.
arXiv Detail & Related papers (2022-05-31T16:31:36Z) - Collaborative Representation for SPD Matrices with Application to
Image-Set Classification [12.447073442122468]
Collaborative representation-based classification (CRC) has demonstrated remarkable progress in the past few years.
The existing CRC methods are incapable of processing the nonlinear variational information directly.
Recent advances illustrate that how to effectively model these nonlinear variational information and learn invariant representations is an open challenge.
arXiv Detail & Related papers (2022-01-22T04:56:53Z) - Deep Optimal Transport for Domain Adaptation on SPD Manifolds [9.552869120136005]
neuroimaging data possess the mathematical properties of symmetry and positive definiteness.
Applying conventional domain adaptation methods is challenging because these mathematical properties can be disrupted.
We introduce a novel geometric deep learning-based approach to manage discrepancies in both marginal and conditional distributions.
arXiv Detail & Related papers (2022-01-15T03:13:02Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.