Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains
- URL: http://arxiv.org/abs/2210.04133v1
- Date: Sun, 9 Oct 2022 01:43:08 GMT
- Title: Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains
- Authors: Pierre Chambon, Christian Bluethgen, Curtis P. Langlotz, Akshay
Chaudhari
- Abstract summary: Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
- Score: 3.8137985834223502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal foundation models are typically trained on millions of pairs of
natural images and text captions, frequently obtained through web-crawling
approaches. Although such models depict excellent generative capabilities, they
do not typically generalize well to specific domains such as medical images
that have fundamentally shifted distributions compared to natural images.
Building generative models for medical images that faithfully depict clinical
context may help alleviate the paucity of healthcare datasets. Thus, in this
study, we seek to research and expand the representational capabilities of
large pretrained foundation models to medical concepts, specifically for
leveraging the Stable Diffusion model to generate domain specific images found
in medical imaging. We explore the sub-components of the Stable Diffusion
pipeline (the variational autoencoder, the U-Net and the text-encoder) to
fine-tune the model to generate medical images. We benchmark the efficacy of
these efforts using quantitative image quality metrics and qualitative
radiologist-driven evaluations that accurately represent the clinical content
of conditional text prompts. Our best-performing model improves upon the stable
diffusion baseline and can be conditioned to insert a realistic-looking
abnormality on a synthetic radiology image, while maintaining a 95% accuracy on
a classifier trained to detect the abnormality.
Related papers
- Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis [22.0080610434872]
We propose a controlled generation framework for synthetic images with annotations.
We show that our approach can produce annotated lung CT images that can faithfully represent anatomy.
Our experiments demonstrate that controlled generative frameworks of this nature can surpass nearly every state-of-the-art image generative model.
arXiv Detail & Related papers (2024-09-07T01:19:02Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image.
We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models.
The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Trade-offs in Fine-tuned Diffusion Models Between Accuracy and
Interpretability [5.865936619867771]
We unravel a consequential trade-off between image fidelity as gauged by conventional metrics and model interpretability in generative diffusion models.
We present a set of design principles for the development of truly interpretable generative models.
arXiv Detail & Related papers (2023-03-31T09:11:26Z) - RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [7.618389245539657]
We develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays.
We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts.
We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images.
arXiv Detail & Related papers (2022-11-23T06:58:09Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Domain Generalization for Medical Imaging Classification with
Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification.
Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.