Related papers: Multimodal Fusion SLAM with Fourier Attention

Multimodal Fusion SLAM with Fourier Attention

URL: http://arxiv.org/abs/2506.18204v2
Date: Tue, 24 Jun 2025 09:24:14 GMT
Title: Multimodal Fusion SLAM with Fourier Attention
Authors: Youjie Zhou, Guofeng Mei, Yiming Wang, Yi Wan, Fabio Poiesi,
Abstract summary: We propose FMF-SLAM, an efficient multimodal fusion SLAM method that utilizes fast Fourier transform (FFT) to enhance the algorithm efficiency.<n>Specifically, we introduce a novel Fourier-based self-attention and cross-attention mechanism to extract features from RGB and depth signals.<n>Our approach is validated using video sequences from TUM, TartanAir, and our real-world datasets, showcasing state-of-the-art performance under noisy, varying lighting, and dark conditions.
Score: 15.2253217769593
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual SLAM is particularly challenging in environments affected by noise, varying lighting conditions, and darkness. Learning-based optical flow algorithms can leverage multiple modalities to address these challenges, but traditional optical flow-based visual SLAM approaches often require significant computational resources.To overcome this limitation, we propose FMF-SLAM, an efficient multimodal fusion SLAM method that utilizes fast Fourier transform (FFT) to enhance the algorithm efficiency. Specifically, we introduce a novel Fourier-based self-attention and cross-attention mechanism to extract features from RGB and depth signals. We further enhance the interaction of multimodal features by incorporating multi-scale knowledge distillation across modalities. We also demonstrate the practical feasibility of FMF-SLAM in real-world scenarios with real time performance by integrating it with a security robot by fusing with a global positioning module GNSS-RTK and global Bundle Adjustment. Our approach is validated using video sequences from TUM, TartanAir, and our real-world datasets, showcasing state-of-the-art performance under noisy, varying lighting, and dark conditions.Our code and datasets are available at https://github.com/youjie-zhou/FMF-SLAM.git.

Related papers

FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model [8.392891463947661]
We propose Wavelet-Mamba, which integrates wavelet transform with the state-space model (SSM)<n>Wavelet-SSM module incorporates wavelet-based frequency domain feature extraction and global information extraction through SSM.<n>Our method achieves both visually compelling results and superior performance compared to current state-of-the-art methods.
arXiv Detail & Related papers (2025-03-24T06:25:44Z)
FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection [7.246630480680039]
Camouflaged Object Detection (COD) is challenging due to the strong similarity between camouflaged objects and their surroundings.<n>Existing methods mainly rely on spatial local features, failing to capture global information.<n> Frequency-Assisted Mamba-Like Linear Attention Network (FMNet) is proposed to efficiently capture global features.
arXiv Detail & Related papers (2025-03-14T02:55:19Z)
Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection [8.607385112274882]
Deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images.<n>Existing deep learning-based RGB-T SOD models suffer from two major limitations.<n>We propose a purely Fourier transform-based model, namely Deep Fourier-Embedded Network (DFENet) for accurate RGB-T SOD.
arXiv Detail & Related papers (2024-11-27T14:55:16Z)
Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement [26.13172849144202]
We propose a novel Wavelet-based Mamba with Fourier Adjustment model called WalMaFa. WMB is adopted in the Decoder and FFAB is adopted in the Latent-Decoder structure. Experiments demonstrate that our proposed WalMaFa achieves state-of-the-art performance with fewer computational resources and faster speed.
arXiv Detail & Related papers (2024-10-27T02:48:28Z)
A Dual Domain Multi-exposure Image Fusion Network based on the Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures. We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI. Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z)
Adaptive Frequency Filters As Efficient Global Token Mixers [100.27957692579892]
We show that adaptive frequency filters can serve as efficient global token mixers. We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet.
arXiv Detail & Related papers (2023-07-26T07:42:28Z)
Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion [59.19469551774703]
Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks. We propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts. Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate.
arXiv Detail & Related papers (2023-02-02T20:06:58Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
Functional Regularization for Reinforcement Learning via Learned Fourier Features [98.90474131452588]
We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis. We show that it improves the sample efficiency of both state-based and image-based RL.
arXiv Detail & Related papers (2021-12-06T18:59:52Z)
Beyond Self Attention: A Subquadratic Fourier Wavelet Transformer with Multi Modal Fusion [0.0]
We revisit the use of spectral techniques to replace the attention mechanism in Transformers.<n>We present a comprehensive and novel reformulation of this technique in next generation transformer models.
arXiv Detail & Related papers (2021-11-25T18:03:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.