Related papers: TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation

TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation

URL: http://arxiv.org/abs/2511.12270v1
Date: Sat, 15 Nov 2025 15:49:30 GMT
Title: TM-UNet: Token-Memory Enhanced Sequential Modeling for Efficient Medical Image Segmentation
Authors: Yaxuan Jiao, Qing Xu, Yuxiang Luo, Xiangjian He, Zhen Chen, Wenting Duan,
Abstract summary: TM-UNet is a novel lightweight framework that integrates token sequence modeling with an efficient memory mechanism for efficient medical segmentation.<n>Our MSTM block acts as a dynamic knowledge store that captures long-range dependencies with linear complexity.<n>Extensive experiments demonstrate that TM-UNet outperforms state-of-the-art methods across diverse medical segmentation tasks with substantially reduced computation cost.
Score: 8.178014138307288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical image segmentation is essential for clinical diagnosis and treatment planning. Although transformer-based methods have achieved remarkable results, their high computational cost hinders clinical deployment. To address this issue, we propose TM-UNet, a novel lightweight framework that integrates token sequence modeling with an efficient memory mechanism for efficient medical segmentation. Specifically, we introduce a multi-scale token-memory (MSTM) block that transforms 2D spatial features into token sequences through strategic spatial scanning, leveraging matrix memory cells to selectively retain and propagate discriminative contextual information across tokens. This novel token-memory mechanism acts as a dynamic knowledge store that captures long-range dependencies with linear complexity, enabling efficient global reasoning without redundant computation. Our MSTM block further incorporates exponential gating to identify token effectiveness and multi-scale contextual extraction via parallel pooling operations, enabling hierarchical representation learning without computational overhead. Extensive experiments demonstrate that TM-UNet outperforms state-of-the-art methods across diverse medical segmentation tasks with substantially reduced computation cost. The code is available at https://github.com/xq141839/TM-UNet.

Related papers

MetaFormer-driven Encoding Network for Robust Medical Semantic Segmentation [0.0]
This paper proposes MFEnNet, an efficient medical image segmentation framework that incorporates MetaFormer in the encoding phase of the U-Net backbone.<n>To mitigate the substantial computational cost associated with self-attention, the proposed framework replaces conventional transformer modules with pooling transformer blocks.<n> Comprehensive experiments on different medical segmentation benchmarks demonstrate that the proposed MFEnNet approach attains competitive accuracy while significantly lowering computational cost compared to state-of-the-art models.
arXiv Detail & Related papers (2026-01-01T13:45:50Z)
impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z)
EfficientGFormer: Multimodal Brain Tumor Segmentation via Pruned Graph-Augmented Transformer [0.0]
EfficientGFormer is a novel architecture that integrates pretrained foundation models with graph-based reasoning.<n> Experiments on the MSD Task01 and BraTS 2021 datasets demonstrate that EfficientGFormer achieves state-of-the-art accuracy with significantly reduced memory and inference time.
arXiv Detail & Related papers (2025-08-02T18:52:59Z)
Prompt-based Dynamic Token Pruning for Efficient Segmentation of Medical Images [1.4146420810689422]
This research proposes a Prompt-driven Adaptive Token pruning method to selectively reduce the processing of irrelevant tokens in the segmentation pipeline.<n>The experimental results show a reduction of $sim$ 35-55% tokens; thus reducing the computational costs relative to baselines.
arXiv Detail & Related papers (2025-06-19T14:45:46Z)
InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation [15.666926528144202]
We propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features.<n>We exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features.<n>Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets.
arXiv Detail & Related papers (2025-06-13T20:25:12Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data. The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale. Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.