Related papers: MaskMed: Decoupled Mask and Class Prediction for Medical Image Segmentation

MaskMed: Decoupled Mask and Class Prediction for Medical Image Segmentation

URL: http://arxiv.org/abs/2511.15603v1
Date: Wed, 19 Nov 2025 16:49:02 GMT
Title: MaskMed: Decoupled Mask and Class Prediction for Medical Image Segmentation
Authors: Bin Xie, Gady Agam,
Abstract summary: We propose a unified decoupled segmentation head that separates multi-class prediction into class-agnostic mask prediction and class label prediction using shared object queries.<n>Our proposed method, named MaskMed, achieves state-of-the-art performance, surpassing nnUNet by +2.0% Dice on AMOS 2022 and +6.9% Dice on BTCV.
Score: 10.150775949368223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical image segmentation typically adopts a point-wise convolutional segmentation head to predict dense labels, where each output channel is heuristically tied to a specific class. This rigid design limits both feature sharing and semantic generalization. In this work, we propose a unified decoupled segmentation head that separates multi-class prediction into class-agnostic mask prediction and class label prediction using shared object queries. Furthermore, we introduce a Full-Scale Aware Deformable Transformer module that enables low-resolution encoder features to attend across full-resolution encoder features via deformable attention, achieving memory-efficient and spatially aligned full-scale fusion. Our proposed method, named MaskMed, achieves state-of-the-art performance, surpassing nnUNet by +2.0% Dice on AMOS 2022 and +6.9% Dice on BTCV.

Related papers

Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis [1.6916040234975798]
Traditional deep learning methods in medical imaging often focus solely on segmentation or classification.<n>We propose a simple yet effective UNet-based MTL model, where features extracted by the encoder are used to predict classification labels, while the decoder produces the segmentation mask.<n> Experimental results across multiple medical datasets confirm the superior performance of our model in both segmentation and classification tasks.
arXiv Detail & Related papers (2024-11-30T04:20:05Z)
Light-weight Retinal Layer Segmentation with Global Reasoning [14.558920359236572]
We propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet.
arXiv Detail & Related papers (2024-04-25T05:42:41Z)
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images [2.9137615132901704]
We present a novel deep learning-based approach, MIPC-Net, for precise boundary segmentation in medical images. We introduce the MIPC module, which enhances the focus on channel information when extracting position features. We also propose the GL-MIPC-Residue, a global residual connection that enhances the integration of the encoder and decoder.
arXiv Detail & Related papers (2024-04-12T02:14:35Z)
Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption. We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training [17.9004421784014]
We propose a novel contrastive learning framework named MACL for medical image segmentation pre-training.<n>Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously.<n>Experiments on 8 medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies.
arXiv Detail & Related papers (2023-09-21T08:22:44Z)
DFormer: Diffusion-guided Transformer for Universal Image Segmentation [86.73405604947459]
The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks. Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val 2017 set.
arXiv Detail & Related papers (2023-06-06T06:33:32Z)
Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation. We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting. Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z)
Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks. We propose MaskFormer, a simple mask classification model which predicts a set of binary masks. Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z)
Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.