Text-guided Foundation Model Adaptation for Pathological Image
Classification
- URL: http://arxiv.org/abs/2307.14901v1
- Date: Thu, 27 Jul 2023 14:44:56 GMT
- Title: Text-guided Foundation Model Adaptation for Pathological Image
Classification
- Authors: Yunkun Zhang, Jin Gao, Mu Zhou, Xiaosong Wang, Yu Qiao, Shaoting
Zhang, Dequan Wang
- Abstract summary: We propose to connect image and text Embeddings (CITE) to enhance pathological image classification.
CITE injects text insights gained from language models pre-trained with a broad range of biomedical texts, leading to adapt foundation models towards pathological image understanding.
- Score: 40.45252665455015
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The recent surge of foundation models in computer vision and natural language
processing opens up perspectives in utilizing multi-modal clinical data to
train large models with strong generalizability. Yet pathological image
datasets often lack biomedical text annotation and enrichment. Guiding
data-efficient image diagnosis from the use of biomedical text knowledge
becomes a substantial interest. In this paper, we propose to Connect Image and
Text Embeddings (CITE) to enhance pathological image classification. CITE
injects text insights gained from language models pre-trained with a broad
range of biomedical texts, leading to adapt foundation models towards
pathological image understanding. Through extensive experiments on the
PatchGastric stomach tumor pathological image dataset, we demonstrate that CITE
achieves leading performance compared with various baselines especially when
training data is scarce. CITE offers insights into leveraging in-domain text
knowledge to reinforce data-efficient pathological image classification. Code
is available at https://github.com/Yunkun-Zhang/CITE.
Related papers
- A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation [12.948027961485536]
We propose a novel Weakly Supervised Semantic (WSSS) approach that integrates structural guidance with text-driven strategies to generate high-quality pseudo labels.
Our method achieves state-of-the-art performance, highlighting its potential to improve diagnostic accuracy and efficiency in medical imaging.
arXiv Detail & Related papers (2024-11-19T16:20:27Z) - HistoSPACE: Histology-Inspired Spatial Transcriptome Prediction And Characterization Engine [0.0]
HistoSPACE model explore the diversity of histological images available with ST data to extract molecular insights from tissue image.
Model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation.
arXiv Detail & Related papers (2024-08-07T07:12:52Z) - Knowledge-enhanced Visual-Language Pretraining for Computational Pathology [68.6831438330526]
We consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources.
We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues.
arXiv Detail & Related papers (2024-04-15T17:11:25Z) - Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks [4.1942958779358674]
This paper utilizes recent vision-language models to produce diverse and realistic synthetic echocardiography image data.
We show that the rich contextual information present in the synthesized data potentially enhances the accuracy and interpretability of downstream tasks.
arXiv Detail & Related papers (2024-03-28T23:26:45Z) - Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [43.453943987647015]
Medical vision language pre-training has emerged as a frontier of research, enabling zero-shot pathological recognition.
Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports.
This is achieved by consulting a large language model and medical experts.
Ours improves the accuracy of recent methods by up to 8.56% and 17.26% for seen and unseen categories, respectively.
arXiv Detail & Related papers (2024-03-12T13:18:22Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Towards a Visual-Language Foundation Model for Computational Pathology [5.72536252929528]
We introduce CONtrastive learning from Captions for Histopathology (CONCH)
CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining.
It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
arXiv Detail & Related papers (2023-07-24T16:13:43Z) - Semantic segmentation of multispectral photoacoustic images using deep
learning [53.65837038435433]
Photoacoustic imaging has the potential to revolutionise healthcare.
Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information.
We present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images.
arXiv Detail & Related papers (2021-05-20T09:33:55Z) - Pathological Retinal Region Segmentation From OCT Images Using Geometric
Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.