Related papers: ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation

ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation

URL: http://arxiv.org/abs/2512.10316v1
Date: Thu, 11 Dec 2025 06:08:29 GMT
Title: ConStruct: Structural Distillation of Foundation Models for Prototype-Based Weakly Supervised Histopathology Segmentation
Authors: Khang Le, Ha Thach, Anh M. Vu, Trang T. K. Vo, Han H. Huynh, David Yang, Minh H. N. Le, Thanh-Huy Nguyen, Akash Awasthi, Chandra Mohan, Zhu Han, Hien Van Nguyen,
Abstract summary: Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones.<n>We propose a prototype learning framework that integrates morphology-aware representations from CONCH, multi-scale structural cues from SegFormer, and text-guided semantic alignment.<n>Our approach produces high-quality pseudo masks without pixel-level annotations, improves localization completeness, and enhances semantic consistency across tissue types.
Score: 16.733170895296343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology relies heavily on classification backbones, yet these models often localize only the most discriminative regions and struggle to capture the full spatial extent of tissue structures. Vision-language models such as CONCH offer rich semantic alignment and morphology-aware representations, while modern segmentation backbones like SegFormer preserve fine-grained spatial cues. However, combining these complementary strengths remains challenging, especially under weak supervision and without dense annotations. We propose a prototype learning framework for WSSS in histopathological images that integrates morphology-aware representations from CONCH, multi-scale structural cues from SegFormer, and text-guided semantic alignment to produce prototypes that are simultaneously semantically discriminative and spatially coherent. To effectively leverage these heterogeneous sources, we introduce text-guided prototype initialization that incorporates pathology descriptions to generate more complete and semantically accurate pseudo-masks. A structural distillation mechanism transfers spatial knowledge from SegFormer to preserve fine-grained morphological patterns and local tissue boundaries during prototype learning. Our approach produces high-quality pseudo masks without pixel-level annotations, improves localization completeness, and enhances semantic consistency across tissue types. Experiments on BCSS-WSSS datasets demonstrate that our prototype learning framework outperforms existing WSSS methods while remaining computationally efficient through frozen foundation model backbones and lightweight trainable adapters.

Related papers

DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation [19.307501518696622]
We propose a prototype-driven framework that leverages vision-language alignment to improve region discovery under weak supervision.<n>Our method integrates CoOp-style learnable prompt tuning to generate text-based prototypes and combines them with learnable image prototypes, forming a dual-modal prototype bank.
arXiv Detail & Related papers (2025-12-11T06:03:28Z)
LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation [17.25487101903999]
Weakly supervised semantic segmentation (WSSS) in histopathology is hindered by inter-class homogeneity, intra-class heterogeneity, and CAM-induced region shrinkage.<n>We propose a cluster-free, one-stage learnable-prototype framework with diversity regularization to enhance morphological intra-class heterogeneity coverage.<n>Our approach achieves state-of-the-art (SOTA) performance on BCSS-WSSS, outperforming prior methods in mIoU and mDice.
arXiv Detail & Related papers (2025-12-05T17:59:16Z)
When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering [26.67465778995387]
SemST is a semantic-guided deep learning framework for spatial transcriptomics data clustering.<n>FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features.<n> experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance.
arXiv Detail & Related papers (2025-11-14T15:03:41Z)
Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation [9.790130257265217]
MPAMatch is a novel segmentation framework that performs pixel-level contrastive learning under a multimodal prototype-guided supervision paradigm.<n>The core innovation of MPAMatch lies in the dual contrastive learning scheme between image prototypes and pixel labels, and between text prototypes and pixel labels.<n>In addition, we reconstruct the classic segmentation architecture (TransUNet) by replacing its ViT backbone with a pathology-pretrained foundation model (Uni)
arXiv Detail & Related papers (2025-08-27T05:15:13Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.<n>Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.<n>Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z)
Medical Image Registration Meets Vision Foundation Model: Prototype Learning and Contour Awareness [11.671950446844356]
Existing deformable registration methods rely solely on intensity-based similarity metrics, lacking explicit anatomical knowledge.<n>We propose a novel SAM-assisted registration framework incorporating prototype learning and contour awareness.<n>Our framework significantly outperforms existing methods across multiple datasets.
arXiv Detail & Related papers (2025-02-17T04:54:47Z)
Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting [52.364260925700485]
Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF)<n>It remains unclear whether existing Transformers fully leverage the intrinsic topological structure among tokens throughout intermediate layers.<n>We propose the Topology Enhancement Method (TEM), a novel Transformer-based TSF method that explicitly and adaptively preserves token-level topology.
arXiv Detail & Related papers (2024-04-16T07:21:39Z)
A Learning-based Framework for Topology-Preserving Segmentation using Quasiconformal Mappings [3.4798343542796593]
We propose a deformation-based model that can extract objects in an image while maintaining their topological properties. This network generates segmentation masks that have the same topology as the template mask, even when trained with limited data.
arXiv Detail & Related papers (2022-10-07T03:13:35Z)
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
Structure-Aware Feature Generation for Zero-Shot Learning [108.76968151682621]
We introduce a novel structure-aware feature generation scheme, termed as SA-GAN, to account for the topological structure in learning both the latent space and the generative networks. Our method significantly enhances the generalization capability on unseen-classes and consequently improve the classification performance.
arXiv Detail & Related papers (2021-08-16T11:52:08Z)
Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.