A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis
- URL: http://arxiv.org/abs/2412.13126v1
- Date: Tue, 17 Dec 2024 17:45:21 GMT
- Title: A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis
- Authors: Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, Lifeng Wang, Xin Sun, Kun Sun, Ya Zhang, Yanfeng Wang, Weidi Xie,
- Abstract summary: We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.
KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
- Score: 58.85247337449624
- License:
- Abstract: Deep learning has enabled the development of highly robust foundation models for various pathological tasks across diverse diseases and patient cohorts. Among these models, vision-language pre-training, which leverages large-scale paired data to align pathology image and text embedding spaces, and provides a novel zero-shot paradigm for downstream tasks. However, existing models have been primarily data-driven and lack the incorporation of domain-specific knowledge, which limits their performance in cancer diagnosis, especially for rare tumor subtypes. To address this limitation, we establish a Knowledge-enhanced Pathology (KEEP) foundation model that harnesses disease knowledge to facilitate vision-language pre-training. Specifically, we first construct a disease knowledge graph (KG) that covers 11,454 human diseases with 139,143 disease attributes, including synonyms, definitions, and hypernym relations. We then systematically reorganize the millions of publicly available noisy pathology image-text pairs, into 143K well-structured semantic groups linked through the hierarchical relations of the disease KG. To derive more nuanced image and text representations, we propose a novel knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups instead of unstructured image-text pairs. Validated on 18 diverse benchmarks with more than 14,000 whole slide images (WSIs), KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks. Notably, for cancer detection, KEEP demonstrates an average sensitivity of 89.8% at a specificity of 95.0% across 7 cancer types. For cancer subtyping, KEEP achieves a median balanced accuracy of 0.456 in subtyping 30 rare brain cancers, indicating strong generalizability for diagnosing rare tumors.
Related papers
- MRANet: A Modified Residual Attention Networks for Lung and Colon Cancer Classification [0.0]
Lung and colon cancers are predominant contributors to cancer mortality.
By utilizing imaging technology in different image detection, learning models have shown promise in automating cancer classification.
We proposed a novel approach based on a modified residual attention network architecture.
Our proposed model achieved an exceptional accuracy of 99.30%, 96.63%, and 97.56% for two, three, and five classes, respectively.
arXiv Detail & Related papers (2024-12-23T16:31:45Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - Knowledge-enhanced Visual-Language Pretraining for Computational Pathology [68.6831438330526]
We consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources.
We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues.
arXiv Detail & Related papers (2024-04-15T17:11:25Z) - Virchow: A Million-Slide Digital Pathology Foundation Model [34.38679208931425]
We present Virchow, a foundation model for computational pathology.
Virchow is a vision transformer model with 632 million parameters trained on 1.5 million hematoxylin and eosin stained whole slide images.
arXiv Detail & Related papers (2023-09-14T15:09:35Z) - CAMIL: Context-Aware Multiple Instance Learning for Cancer Detection and Subtyping in Whole Slide Images [3.1118773046912382]
We propose the Context-Aware Multiple Instance Learning (CAMIL) architecture for cancer diagnosis.
CAMIL incorporates neighbor-constrained attention to consider dependencies among tiles within a Whole Slide Images (WSI) and integrates contextual constraints as prior knowledge.
We evaluate CAMIL on subtyping non-small cell lung cancer (TCGA-NSCLC) and detecting lymph node metastasis, achieving test AUCs of 97.5%, 95.9%, and 88.1%, respectively.
arXiv Detail & Related papers (2023-05-09T10:06:37Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - Pan-Cancer Integrative Histology-Genomic Analysis via Interpretable
Multimodal Deep Learning [4.764927152701701]
We integrate whole slide pathology images, RNA-seq abundance, copy number variation, and mutation data from 5,720 patients across 14 major cancer types.
Our interpretable, weakly-supervised, multimodal deep learning algorithm is able to fuse these heterogeneous modalities for predicting outcomes.
We analyze morphologic and molecular markers responsible for prognostic predictions across all cancer types.
arXiv Detail & Related papers (2021-08-04T20:40:05Z) - Wide & Deep neural network model for patch aggregation in CNN-based
prostate cancer detection systems [51.19354417900591]
Prostate cancer (PCa) is one of the leading causes of death among men, with almost 1.41 million new cases and around 375,000 deaths in 2020.
To perform an automatic diagnosis, prostate tissue samples are first digitized into gigapixel-resolution whole-slide images.
Small subimages called patches are extracted and predicted, obtaining a patch-level classification.
arXiv Detail & Related papers (2021-05-20T18:13:58Z) - In-Line Image Transformations for Imbalanced, Multiclass Computer Vision
Classification of Lung Chest X-Rays [91.3755431537592]
This study aims to leverage a body of literature in order to apply image transformations that would serve to balance the lack of COVID-19 LCXR data.
Deep learning techniques such as convolutional neural networks (CNNs) are able to select features that distinguish between healthy and disease states.
This study utilizes a simple CNN architecture for high-performance multiclass LCXR classification at 94 percent accuracy.
arXiv Detail & Related papers (2021-04-06T02:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.