Text-guided Foundation Model Adaptation for Pathological Image
Classification
- URL: http://arxiv.org/abs/2307.14901v1
- Date: Thu, 27 Jul 2023 14:44:56 GMT
- Title: Text-guided Foundation Model Adaptation for Pathological Image
Classification
- Authors: Yunkun Zhang, Jin Gao, Mu Zhou, Xiaosong Wang, Yu Qiao, Shaoting
Zhang, Dequan Wang
- Abstract summary: We propose to connect image and text Embeddings (CITE) to enhance pathological image classification.
CITE injects text insights gained from language models pre-trained with a broad range of biomedical texts, leading to adapt foundation models towards pathological image understanding.
- Score: 40.45252665455015
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The recent surge of foundation models in computer vision and natural language
processing opens up perspectives in utilizing multi-modal clinical data to
train large models with strong generalizability. Yet pathological image
datasets often lack biomedical text annotation and enrichment. Guiding
data-efficient image diagnosis from the use of biomedical text knowledge
becomes a substantial interest. In this paper, we propose to Connect Image and
Text Embeddings (CITE) to enhance pathological image classification. CITE
injects text insights gained from language models pre-trained with a broad
range of biomedical texts, leading to adapt foundation models towards
pathological image understanding. Through extensive experiments on the
PatchGastric stomach tumor pathological image dataset, we demonstrate that CITE
achieves leading performance compared with various baselines especially when
training data is scarce. CITE offers insights into leveraging in-domain text
knowledge to reinforce data-efficient pathological image classification. Code
is available at https://github.com/Yunkun-Zhang/CITE.
Related papers
- Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks [4.1942958779358674]
This paper utilizes recent vision-language models to produce diverse and realistic synthetic echocardiography image data.
We show that the rich contextual information present in the synthesized data potentially enhances the accuracy and interpretability of downstream tasks.
arXiv Detail & Related papers (2024-03-28T23:26:45Z) - Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [43.453943987647015]
Medical vision language pre-training has emerged as a frontier of research, enabling zero-shot pathological recognition.
Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports.
This is achieved by consulting a large language model and medical experts.
Ours improves the accuracy of recent methods by up to 8.56% and 17.26% for seen and unseen categories, respectively.
arXiv Detail & Related papers (2024-03-12T13:18:22Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Towards a Visual-Language Foundation Model for Computational Pathology [5.72536252929528]
We introduce CONtrastive learning from Captions for Histopathology (CONCH)
CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining.
It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
arXiv Detail & Related papers (2023-07-24T16:13:43Z) - Semantic segmentation of multispectral photoacoustic images using deep
learning [53.65837038435433]
Photoacoustic imaging has the potential to revolutionise healthcare.
Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information.
We present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images.
arXiv Detail & Related papers (2021-05-20T09:33:55Z) - Learning domain-agnostic visual representation for computational
pathology using medically-irrelevant style transfer augmentation [4.538771844947821]
STRAP (Style TRansfer Augmentation for histoPathology) is a form of data augmentation based on random style transfer from artistic paintings.
Style transfer replaces the low-level texture content of images with the uninformative style of randomly selected artistic paintings.
We demonstrate that STRAP leads to state-of-the-art performance, particularly in the presence of domain shifts.
arXiv Detail & Related papers (2021-02-02T18:50:16Z) - Pathological Retinal Region Segmentation From OCT Images Using Geometric
Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.