Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains
- URL: http://arxiv.org/abs/2210.04133v1
- Date: Sun, 9 Oct 2022 01:43:08 GMT
- Title: Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains
- Authors: Pierre Chambon, Christian Bluethgen, Curtis P. Langlotz, Akshay
Chaudhari
- Abstract summary: Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
- Score: 3.8137985834223502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal foundation models are typically trained on millions of pairs of
natural images and text captions, frequently obtained through web-crawling
approaches. Although such models depict excellent generative capabilities, they
do not typically generalize well to specific domains such as medical images
that have fundamentally shifted distributions compared to natural images.
Building generative models for medical images that faithfully depict clinical
context may help alleviate the paucity of healthcare datasets. Thus, in this
study, we seek to research and expand the representational capabilities of
large pretrained foundation models to medical concepts, specifically for
leveraging the Stable Diffusion model to generate domain specific images found
in medical imaging. We explore the sub-components of the Stable Diffusion
pipeline (the variational autoencoder, the U-Net and the text-encoder) to
fine-tune the model to generate medical images. We benchmark the efficacy of
these efforts using quantitative image quality metrics and qualitative
radiologist-driven evaluations that accurately represent the clinical content
of conditional text prompts. Our best-performing model improves upon the stable
diffusion baseline and can be conditioned to insert a realistic-looking
abnormality on a synthetic radiology image, while maintaining a 95% accuracy on
a classifier trained to detect the abnormality.
Related papers
- Trustworthy image-to-image translation: evaluating uncertainty calibration in unpaired training scenarios [0.0]
Mammographic screening is an effective method for detecting breast cancer, facilitating early diagnosis.
Deep neural networks have been shown effective in some studies, but their tendency to overfit leaves considerable risk for poor generalisation and misdiagnosis.
Data augmentation schemes based on unpaired neural style transfer models have been proposed that improve generalisability.
We evaluate their performance when trained on image patches parsed from three open access mammography datasets and one non-medical image dataset.
arXiv Detail & Related papers (2025-01-29T11:09:50Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models.
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.
Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis [22.0080610434872]
We propose a controlled generation framework for synthetic images with annotations.
We show that our approach can produce annotated lung CT images that can faithfully represent anatomy.
Our experiments demonstrate that controlled generative frameworks of this nature can surpass nearly every state-of-the-art image generative model.
arXiv Detail & Related papers (2024-09-07T01:19:02Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Trade-offs in Fine-tuned Diffusion Models Between Accuracy and
Interpretability [5.865936619867771]
We unravel a consequential trade-off between image fidelity as gauged by conventional metrics and model interpretability in generative diffusion models.
We present a set of design principles for the development of truly interpretable generative models.
arXiv Detail & Related papers (2023-03-31T09:11:26Z) - RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [7.618389245539657]
We develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays.
We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts.
We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images.
arXiv Detail & Related papers (2022-11-23T06:58:09Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z) - Domain Generalization for Medical Imaging Classification with
Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification.
Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.