Related papers: Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains

Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains

URL: http://arxiv.org/abs/2210.04133v1
Date: Sun, 9 Oct 2022 01:43:08 GMT
Title: Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains
Authors: Pierre Chambon, Christian Bluethgen, Curtis P. Langlotz, Akshay Chaudhari
Abstract summary: Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets. We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images. Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
Score: 3.8137985834223502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal foundation models are typically trained on millions of pairs of natural images and text captions, frequently obtained through web-crawling approaches. Although such models depict excellent generative capabilities, they do not typically generalize well to specific domains such as medical images that have fundamentally shifted distributions compared to natural images. Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets. Thus, in this study, we seek to research and expand the representational capabilities of large pretrained foundation models to medical concepts, specifically for leveraging the Stable Diffusion model to generate domain specific images found in medical imaging. We explore the sub-components of the Stable Diffusion pipeline (the variational autoencoder, the U-Net and the text-encoder) to fine-tune the model to generate medical images. We benchmark the efficacy of these efforts using quantitative image quality metrics and qualitative radiologist-driven evaluations that accurately represent the clinical content of conditional text prompts. Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image, while maintaining a 95% accuracy on a classifier trained to detect the abnormality.

Related papers

Causal Disentanglement for Robust Long-tail Medical Image Generation [80.15257897500578]
We propose a novel medical image generation framework, which generates independent pathological and structural features. We leverage a diffusion model guided by pathological findings to model pathological features, enabling the generation of diverse counterfactual images.
arXiv Detail & Related papers (2025-04-20T01:54:18Z)
PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion [5.504796147401483]
Development of reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, data imbalances, and limited text annotations in datasets. We present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals.
arXiv Detail & Related papers (2025-02-28T21:32:08Z)
Trustworthy image-to-image translation: evaluating uncertainty calibration in unpaired training scenarios [0.0]
Mammographic screening is an effective method for detecting breast cancer, facilitating early diagnosis. Deep neural networks have been shown effective in some studies, but their tendency to overfit leaves considerable risk for poor generalisation and misdiagnosis. Data augmentation schemes based on unpaired neural style transfer models have been proposed that improve generalisability. We evaluate their performance when trained on image patches parsed from three open access mammography datasets and one non-medical image dataset.
arXiv Detail & Related papers (2025-01-29T11:09:50Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis [22.0080610434872]
We propose a controlled generation framework for synthetic images with annotations. We show that our approach can produce annotated lung CT images that can faithfully represent anatomy. Our experiments demonstrate that controlled generative frameworks of this nature can surpass nearly every state-of-the-art image generative model.
arXiv Detail & Related papers (2024-09-07T01:19:02Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image. We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models. The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z)
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset. We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z)
Trade-offs in Fine-tuned Diffusion Models Between Accuracy and Interpretability [5.865936619867771]
We unravel a consequential trade-off between image fidelity as gauged by conventional metrics and model interpretability in generative diffusion models. We present a set of design principles for the development of truly interpretable generative models.
arXiv Detail & Related papers (2023-03-31T09:11:26Z)
RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [7.618389245539657]
We develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays. We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images.
arXiv Detail & Related papers (2022-11-23T06:58:09Z)
Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing. In this paper, we develop a novel generative method named generative adversarial U-Net. Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification. Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.