Related papers: Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation

Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation

URL: http://arxiv.org/abs/2508.19574v1
Date: Wed, 27 Aug 2025 05:15:13 GMT
Title: Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation
Authors: Mingxi Fu, Fanglei Fu, Xitong Ling, Huaitian Yuan, Tian Guan, Yonghong He, Lianghui Zhu,
Abstract summary: MPAMatch is a novel segmentation framework that performs pixel-level contrastive learning under a multimodal prototype-guided supervision paradigm.<n>The core innovation of MPAMatch lies in the dual contrastive learning scheme between image prototypes and pixel labels, and between text prototypes and pixel labels.<n>In addition, we reconstruct the classic segmentation architecture (TransUNet) by replacing its ViT backbone with a pathology-pretrained foundation model (Uni)
Score: 9.790130257265217
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pathological image segmentation faces numerous challenges, particularly due to ambiguous semantic boundaries and the high cost of pixel-level annotations. Although recent semi-supervised methods based on consistency regularization (e.g., UniMatch) have made notable progress, they mainly rely on perturbation-based consistency within the image modality, making it difficult to capture high-level semantic priors, especially in structurally complex pathology images. To address these limitations, we propose MPAMatch - a novel segmentation framework that performs pixel-level contrastive learning under a multimodal prototype-guided supervision paradigm. The core innovation of MPAMatch lies in the dual contrastive learning scheme between image prototypes and pixel labels, and between text prototypes and pixel labels, providing supervision at both structural and semantic levels. This coarse-to-fine supervisory strategy not only enhances the discriminative capability on unlabeled samples but also introduces the text prototype supervision into segmentation for the first time, significantly improving semantic boundary modeling. In addition, we reconstruct the classic segmentation architecture (TransUNet) by replacing its ViT backbone with a pathology-pretrained foundation model (Uni), enabling more effective extraction of pathology-relevant features. Extensive experiments on GLAS, EBHI-SEG-GLAND, EBHI-SEG-CANCER, and KPI show MPAMatch's superiority over state-of-the-art methods, validating its dual advantages in structural and semantic modeling.

Related papers

A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation [19.307501518696622]
We propose a prototype-driven framework that leverages vision-language alignment to improve region discovery under weak supervision.<n>Our method integrates CoOp-style learnable prompt tuning to generate text-based prototypes and combines them with learnable image prototypes, forming a dual-modal prototype bank.
arXiv Detail & Related papers (2025-12-11T06:03:28Z)
LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation [17.25487101903999]
Weakly supervised semantic segmentation (WSSS) in histopathology is hindered by inter-class homogeneity, intra-class heterogeneity, and CAM-induced region shrinkage.<n>We propose a cluster-free, one-stage learnable-prototype framework with diversity regularization to enhance morphological intra-class heterogeneity coverage.<n>Our approach achieves state-of-the-art (SOTA) performance on BCSS-WSSS, outperforming prior methods in mIoU and mDice.
arXiv Detail & Related papers (2025-12-05T17:59:16Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.<n>Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.<n>Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation [13.640757848445835]
Weakly supervised image segmentation with image-level labels has drawn attention due to the high cost of pixel-level annotations.<n>Traditional methods using Class Activation Maps (CAMs) often highlight only the most discriminative regions.
arXiv Detail & Related papers (2025-03-15T09:55:31Z)
A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation [12.948027961485536]
We propose a novel Weakly Supervised Semantic (WSSS) approach that integrates structural guidance with text-driven strategies to generate high-quality pseudo labels. Our method achieves state-of-the-art performance, highlighting its potential to improve diagnostic accuracy and efficiency in medical imaging.
arXiv Detail & Related papers (2024-11-19T16:20:27Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
RCPS: Rectified Contrastive Pseudo Supervision for Semi-Supervised Medical Image Segmentation [26.933651788004475]
We propose a novel semi-supervised segmentation method named Rectified Contrastive Pseudo Supervision (RCPS) RCPS combines a rectified pseudo supervision and voxel-level contrastive learning to improve the effectiveness of semi-supervised segmentation. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art methods in semi-supervised medical image segmentation.
arXiv Detail & Related papers (2023-01-13T12:03:58Z)
A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation. In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z)
Exploring Feature Representation Learning for Semi-supervised Medical Image Segmentation [30.608293915653558]
We present a two-stage framework for semi-supervised medical image segmentation. Key insight is to explore the feature representation learning with labeled and unlabeled (i.e., pseudo labeled) images. A stage-adaptive contrastive learning method is proposed, containing a boundary-aware contrastive loss. We present an aleatoric uncertainty-aware method, namely AUA, to generate higher-quality pseudo labels.
arXiv Detail & Related papers (2021-11-22T05:06:12Z)
Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation [73.84166499988443]
We present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA) Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images.
arXiv Detail & Related papers (2020-02-06T13:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.