Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation
- URL: http://arxiv.org/abs/2512.10353v1
- Date: Thu, 11 Dec 2025 07:09:32 GMT
- Title: Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation
- Authors: Yiheng Lyu, Lian Xu, Mohammed Bennamoun, Farid Boussaid, Coen Arrow, Girish Dwivedi,
- Abstract summary: TranSamba is a hybrid Transformer-Mamba architecture designed to capture 3D context for weakly supervised medical segmentation.<n>TranSamba achieves effective volumetric modeling with time complexity that scales linearly with the input volume depth.
- Score: 24.49842564073947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders that neglect the inherent volumetric nature of the data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context for weakly supervised volumetric medical segmentation. TranSamba augments a standard Vision Transformer backbone with Cross-Plane Mamba blocks, which leverage the linear complexity of state space models for efficient information exchange across neighboring slices. The information exchange enhances the pairwise self-attention within slices computed by the Transformer blocks, directly contributing to the attention maps for object localization. TranSamba achieves effective volumetric modeling with time complexity that scales linearly with the input volume depth and maintains constant memory usage for batch processing. Extensive experiments on three datasets demonstrate that TranSamba establishes new state-of-the-art performance, consistently outperforming existing methods across diverse modalities and pathologies. Our source code and trained models are openly accessible at: https://github.com/YihengLyu/TranSamba.
Related papers
- HybridTM: Combining Transformer and Mamba for 3D Semantic Segmentation [7.663855540620183]
We propose HybridTM, the first hybrid architecture that integrates Transformer and Mamba for 3D semantic segmentation.<n>In addition, we propose the Inner Layer Hybrid Strategy, which combines attention and Mamba at a finer granularity.<n>Our HybridTM achieves state-of-the-art performance on ScanNet, ScanNet200, and nuScenes benchmarks.
arXiv Detail & Related papers (2025-07-24T16:48:50Z) - TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [66.80624029365448]
We propose a cross-architecture knowledge transfer paradigm, TransMamba, that facilitates the reuse of Transformer pre-trained knowledge.<n>We propose a two-stage framework to accelerate the training of Mamba-based models, ensuring their effectiveness across both uni-modal and multi-modal tasks.
arXiv Detail & Related papers (2025-02-21T01:22:01Z) - SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation [0.13654846342364302]
We present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features.
SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features.
We benchmark SegFormer3D against the current SOTA models on three widely used datasets.
arXiv Detail & Related papers (2024-04-15T22:12:05Z) - Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL)
This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks.
Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism.
We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z) - A Data-scalable Transformer for Medical Image Segmentation:
Architecture, Model Efficiency, and Benchmark [45.543140413399506]
MedFormer is a data-scalable Transformer designed for generalizable 3D medical image segmentation.
Our approach incorporates three key elements: a desirable inductive bias, hierarchical modeling with linear-complexity attention, and multi-scale feature fusion.
arXiv Detail & Related papers (2022-02-28T22:59:42Z) - A Volumetric Transformer for Accurate 3D Tumor Segmentation [25.961484035609672]
This paper presents a Transformer architecture for medical image segmentation.
The Transformer has a U-shaped volumetric encoder-decoder design that processes the input voxels in their entirety.
We show that our model transfer better representations across-datasets and are robust against data corruptions.
arXiv Detail & Related papers (2021-11-26T02:49:51Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Generalize Ultrasound Image Segmentation via Instant and Plug & Play
Style Transfer [65.71330448991166]
Deep segmentation models generalize to images with unknown appearance.
Retraining models leads to high latency and complex pipelines.
We propose a novel method for robust segmentation under unknown appearance shifts.
arXiv Detail & Related papers (2021-01-11T05:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.