MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy
- URL: http://arxiv.org/abs/2509.23677v1
- Date: Sun, 28 Sep 2025 06:34:01 GMT
- Title: MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy
- Authors: Dayu Tan, Ziwei Zhang, Yansan Su, Xin Peng, Yike Dai, Chunhou Zheng, Weimin Zhong,
- Abstract summary: We propose a novel 3D multi-modal image segmentation framework, MSD-KMamba.<n>It integrates bidirectional spatial perception with multi-scale self-distillation.<n>Our framework consistently outperforms state-of-the-art methods in segmentation accuracy, robustness, and generalization.
- Score: 15.270952880303533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerous CNN-Transformer hybrid models rely on high-complexity global attention mechanisms to capture long-range dependencies, which introduces non-linear computational complexity and leads to significant resource consumption. Although knowledge distillation and sparse attention mechanisms can improve efficiency, they often fall short of delivering the high segmentation accuracy necessary for complex tasks. Balancing model performance with computational efficiency remains a critical challenge. In this work, we propose a novel 3D multi-modal image segmentation framework, termed MSD-KMamba, which integrates bidirectional spatial perception with multi-scale self-distillation. The bidirectional spatial aware branch effectively captures long-range spatial context dependencies across brain regions, while also incorporating a powerful nonlinear feature extraction mechanism that further enhances the model's ability to learn complex and heterogeneous patterns. In addition, the proposed multi-scale self-distilled fusion strategy strengthens hierarchical feature representations and improves the transfer of semantic information at different resolution levels. By jointly leveraging the bidirectional spatial perception branch and the multi-scale self-distilled fusion strategy, our framework effectively mitigates the bottleneck of quadratic computational complexity in volumetric segmentation, while simultaneously addressing the limitation of insufficient global perception. Extensive experiments on multiple standard benchmark datasets demonstrate that MSD-KMamba consistently outperforms state-of-the-art methods in segmentation accuracy, robustness, and generalization, while maintaining high computational efficiency and favorable scalability. The source code of MSD-KMamba is publicly available at https://github.com/daimao-zhang/MSD-KMamba.
Related papers
- Shift-Window Meets Dual Attention: A Multi-Model Architecture for Specular Highlight Removal [14.771301170089174]
We propose a multi-model architecture for specular highlight removal (MM-SHR)<n>We employ convolution operations to extract local details in the shallow layers of MM-SHR, and utilize the attention mechanism to capture global features in the deep layers.<n> MM-SHR outperforms state-of-the-art methods in both accuracy and efficiency for specular highlight removal.
arXiv Detail & Related papers (2025-12-04T06:02:37Z) - MM-DETR: An Efficient Multimodal Detection Transformer with Mamba-Driven Dual-Granularity Fusion and Frequency-Aware Modality Adapters [12.063966356953186]
Multimodal remote sensing object detection aims to achieve more accurate and robust perception under challenging conditions.<n>Existing approaches that rely on attention-based or deformable convolution fusion blocks still struggle to balance performance and lightweight design.<n>We propose MM-DETR, a lightweight and efficient framework for multimodal object detection.
arXiv Detail & Related papers (2025-11-29T07:23:01Z) - Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning [56.240199425429445]
Multi-Robot Motion Planning (MPMP) involves generating trajectories for multiple robots operating in a shared continuous workspace.<n>While discrete multi-agent finding (MAPF) methods are broadly adopted due to their scalability, their coarse discretization trajectory quality.<n>This paper tackles limitations of two approaches by introducing discrete MAPF solvers with constrained generative diffusion models.
arXiv Detail & Related papers (2025-08-27T17:59:36Z) - RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation [0.624829068285122]
We propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment.<n>The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module.<n> Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$2$Net outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T16:12:06Z) - MVNet: Hyperspectral Remote Sensing Image Classification Based on Hybrid Mamba-Transformer Vision Backbone Architecture [12.168520751389622]
Hyperspectral image (HSI) classification faces challenges such as high-dimensional data, limited training samples, and spectral redundancy.<n>This paper proposes a novel MVNet network architecture that integrates 3D-CNN's local feature extraction, Transformer's global modeling, and Mamba's linear sequence modeling capabilities.<n>On IN, UP, and KSC datasets, MVNet outperforms mainstream hyperspectral image classification methods in both classification accuracy and computational efficiency.
arXiv Detail & Related papers (2025-07-06T14:52:26Z) - An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z) - MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution [14.265237560766268]
We introduce Multi-Range Attention Transformer (MAT) for image super-resolution (SR) tasks.<n>MAT facilitates both multi-range attention (MA) and sparse multi-range attention (SMA), enabling efficient capture of both regional and sparse global features.<n>We also introduce the MSConvStar module, which augments the model's ability for multi-range representation learning.
arXiv Detail & Related papers (2024-11-26T08:30:31Z) - Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner.
Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z) - Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [51.89707241449435]
In this paper, we address the challenge of integrating multi-head self-attention into high-resolution representation CNNs efficiently.<n>We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.<n>We present a series of models via the Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searches for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.