Related papers: MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

URL: http://arxiv.org/abs/2411.01399v1
Date: Sun, 03 Nov 2024 01:30:59 GMT
Title: MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration
Authors: Kaiang Wen, Bin Xie, Bin Duan, Yan Yan,
Abstract summary: Traditional learning-based approaches often consider registration networks as black boxes without interpretability. We propose MambaReg, a novel Mamba-based architecture that integrates Mamba's strong capability in capturing long sequences. Our network adeptly captures the correlation between multi-modal images, enabling focused deformation field prediction.
Score: 13.146228081053714
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precise alignment of multi-modal images with inherent feature discrepancies poses a pivotal challenge in deformable image registration. Traditional learning-based approaches often consider registration networks as black boxes without interpretability. One core insight is that disentangling alignment features and non-alignment features across modalities bring benefits. Meanwhile, it is challenging for the prominent methods for image registration tasks, such as convolutional neural networks, to capture long-range dependencies by their local receptive fields. The methods often fail when the given image pair has a large misalignment due to the lack of effectively learning long-range dependencies and correspondence. In this paper, we propose MambaReg, a novel Mamba-based architecture that integrates Mamba's strong capability in capturing long sequences to address these challenges. With our proposed several sub-modules, MambaReg can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features. By selectively attending to the relevant features, our network adeptly captures the correlation between multi-modal images, enabling focused deformation field prediction and precise image alignment. The Mamba-based architecture seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba. Experiments on public non-rigid RGB-IR image datasets demonstrate the superiority of our method, outperforming existing approaches in terms of registration accuracy and deformation field smoothness.

Related papers

RegistrationMamba: A Mamba-based Registration Framework Integrating Multi-Expert Feature Learning for Cross-Modal Remote Sensing Images [39.5745769925092]
Cross-modal remote sensing image (CRSI) registration is critical for multi-modal image applications.<n>Existing methods mainly adopt convolutional neural networks (CNNs) or Transformer architectures to extract discriminative features for registration.<n>This paper proposes RegistrationMamba, a novel Mamba architecture based on state space models (SSMs) integrating multi-expert feature learning.
arXiv Detail & Related papers (2025-07-06T13:59:51Z)
BSMamba: Brightness and Semantic Modeling for Long-Range Interaction in Low-Light Image Enhancement [3.3392058493559693]
Current low-light image enhancement (LLIE) methods face significant limitations in simultaneously improving brightness while preserving semantic consistency, fine details, and computational efficiency.<n>We propose BSMamba, a novel visual Mamba architecture comprising two specially designed components: Brightness Mamba and Semantic Mamba.<n>BSMamba achieves state-of-the-art performance in LLIE while preserving semantic consistency.
arXiv Detail & Related papers (2025-06-23T07:04:34Z)
DefMamba: Deformable Visual State Space Model [65.50381013020248]
We propose a novel visual foundation model called DefMamba. By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details. Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks.
arXiv Detail & Related papers (2025-04-08T08:22:54Z)
MatIR: A Hybrid Mamba-Transformer Image Restoration Model [95.17418386046054]
We propose a Mamba-Transformer hybrid image restoration model called MatIR. MatIR cross-cycles the blocks of the Transformer layer and the Mamba layer to extract features. In the Mamba module, we introduce the Image Inpainting State Space (IRSS) module, which traverses along four scan paths.
arXiv Detail & Related papers (2025-01-30T14:55:40Z)
MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data. We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z)
Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction. We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages. We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation [4.227991281224256]
This paper proposes to fully utilize complementary advantages from Mamba and Transformer without sacrificing computation efficiency. The selective scanning mechanism of Mamba is employed to focus on spatial modeling, enabling capture long-range spatial dependencies. The self-attention mechanism of Transformer is applied to focus on channel modeling, avoiding high burdens that are in quadratic growth with image's spatial dimensions.
arXiv Detail & Related papers (2024-12-20T12:36:34Z)
Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task. MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities. We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
ColorMamba: Towards High-quality NIR-to-RGB Spectral Translation with Mamba [0.12499537119440242]
Translating NIR to the visible spectrum is challenging due to cross-domain complexities. Current models struggle to balance a broad receptive field with computational efficiency, limiting practical use. We propose a simple but effective backbone, dubbed ColorMamba, which first introduces Mamba into spectral translation tasks.
arXiv Detail & Related papers (2024-08-15T11:29:13Z)
LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity. Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution. Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z)
Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution [7.97504951029884]
We propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks.
arXiv Detail & Related papers (2024-07-08T14:41:53Z)
RSMamba: Remote Sensing Image Classification with State Space Model [25.32283897448209]
We introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the State Space Model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. We propose a dynamic multi-path activation mechanism to augment Mamba's capacity to model non-temporal image data.
arXiv Detail & Related papers (2024-03-28T17:59:49Z)
ReMamber: Referring Image Segmentation with Mamba Twister [51.291487576255435]
ReMamber is a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block. The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism.
arXiv Detail & Related papers (2024-03-26T16:27:37Z)
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition [21.761988930589727]
PlainMamba is a simple non-hierarchical state space model (SSM) designed for general visual recognition. We adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images. Our architecture is designed to be easy to use and easy to scale, formed by stacking identical PlainMamba blocks.
arXiv Detail & Related papers (2024-03-26T13:35:10Z)
MambaIR: A Simple Baseline for Image Restoration with State-Space Model [46.827053426281715]
We introduce MambaIR, which introduces both local enhancement and channel attention to improve the vanilla Mamba. Our method outperforms SwinIR by up to 0.45dB on image SR, using similar computational cost but with a global receptive field.
arXiv Detail & Related papers (2024-02-23T23:15:54Z)
MAD: Modality Agnostic Distance Measure for Image Registration [14.558286801723293]
Multi-modal image registration is a crucial pre-processing step in many medical applications. We present Modality Agnostic Distance (MAD), a measure that uses random convolutions to learn the inherent geometry of the images. We demonstrate that not only can MAD affinely register multi-modal images successfully, but it has also a larger capture range than traditional measures.
arXiv Detail & Related papers (2023-09-06T09:59:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.