Enhancing Zero-Shot Brain Tumor Subtype Classification via Fine-Grained Patch-Text Alignment
- URL: http://arxiv.org/abs/2508.01602v2
- Date: Wed, 06 Aug 2025 12:04:27 GMT
- Title: Enhancing Zero-Shot Brain Tumor Subtype Classification via Fine-Grained Patch-Text Alignment
- Authors: Lubin Gan, Jing Zhang, Linhao Qu, Yijun Wang, Siying Wu, Xiaoyan Sun,
- Abstract summary: Fine-Grained Patch Alignment Network (FG-PAN) is a novel zero-shot framework tailored for digital pathology.<n>By aligning refined visual features with LLM-generated fine-grained descriptions, FG-PAN effectively increases class separability in both visual and semantic spaces.
- Score: 12.345993355965232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fine-grained classification of brain tumor subtypes from histopathological whole slide images is highly challenging due to subtle morphological variations and the scarcity of annotated data. Although vision-language models have enabled promising zero-shot classification, their ability to capture fine-grained pathological features remains limited, resulting in suboptimal subtype discrimination. To address these challenges, we propose the Fine-Grained Patch Alignment Network (FG-PAN), a novel zero-shot framework tailored for digital pathology. FG-PAN consists of two key modules: (1) a local feature refinement module that enhances patch-level visual features by modeling spatial relationships among representative patches, and (2) a fine-grained text description generation module that leverages large language models to produce pathology-aware, class-specific semantic prototypes. By aligning refined visual features with LLM-generated fine-grained descriptions, FG-PAN effectively increases class separability in both visual and semantic spaces. Extensive experiments on multiple public pathology datasets, including EBRAINS and TCGA, demonstrate that FG-PAN achieves state-of-the-art performance and robust generalization in zero-shot brain tumor subtype classification.
Related papers
- GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification [4.922864692096282]
Multiple Instance Learning (MIL) is the leading approach for whole slide image (WSI) classification.<n>Recent work has introduced vision-language models (VLMs) into MIL pipelines to incorporate medical knowledge.<n>We propose a vision-language MIL framework with two key contributions.
arXiv Detail & Related papers (2025-08-02T09:59:39Z) - Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models [11.774375458215193]
Generalizable Tumor aims to train a single model for zero-shot tumor segmentation across diverse anatomical regions.<n>DiffuGTS creates anomaly-aware open-vocabulary attention maps based on text prompts.<n>Experiments on four datasets and seven tumor categories demonstrate the superior performance of our method.
arXiv Detail & Related papers (2025-05-05T16:05:37Z) - Histomorphology-driven multi-instance learning for breast cancer WSI classification [37.70113409383555]
Current whole slide image (WSI) classification methods struggle to effectively incorporate histomorphology information.<n>We propose a novel framework that explicitly incorporates histomorphology into WSI classification.
arXiv Detail & Related papers (2025-03-23T08:25:29Z) - Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images [7.048241543461529]
We propose a novel framework called Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE) to address these challenges in zero-shot histopathology image classification.<n>We introduce a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings.<n>A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings.
arXiv Detail & Related papers (2025-03-13T12:18:37Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [52.106879463828044]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - Discriminative Image Generation with Diffusion Models for Zero-Shot Learning [53.44301001173801]
We present DIG-ZSL, a novel Discriminative Image Generation framework for Zero-Shot Learning.<n>We learn a discriminative class token (DCT) for each unseen class under the guidance of a pre-trained category discrimination model (CDM)<n>In this paper, the extensive experiments and visualizations on four datasets show that our DIG-ZSL: (1) generates diverse and high-quality images, (2) outperforms previous state-of-the-art nonhuman-annotated semantic prototype-based methods by a large margin, and (3) achieves comparable or better performance than baselines that leverage human-annot
arXiv Detail & Related papers (2024-12-23T02:18:54Z) - HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis [19.04633470168871]
Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy.
In this paper, we propose a novel Hierarchical Adaptive Taxonomy (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights.
Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, and (3) the
arXiv Detail & Related papers (2024-06-30T05:35:26Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Dual Progressive Prototype Network for Generalized Zero-Shot Learning [77.0715029826957]
Generalized Zero-Shot Learning (GZSL) aims to recognize new categories with auxiliary semantic information,e.g., category attributes.
Our approach, named Dual Progressive Prototype Network (DPPN), constructs two types of prototypes that record visual patterns for attributes and categories.
Experiments on four benchmarks prove that DPPN effectively alleviates the domain shift problem in GZSL.
arXiv Detail & Related papers (2021-11-03T08:43:29Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.