RoentGen: Vision-Language Foundation Model for Chest X-ray Generation
- URL: http://arxiv.org/abs/2211.12737v1
- Date: Wed, 23 Nov 2022 06:58:09 GMT
- Title: RoentGen: Vision-Language Foundation Model for Chest X-ray Generation
- Authors: Pierre Chambon, Christian Bluethgen, Jean-Benoit Delbrouck, Rogier Van
der Sluijs, Ma{\l}gorzata Po{\l}acin, Juan Manuel Zambrano Chaves, Tanishq
Mathew Abraham, Shivanshu Purohit, Curtis P. Langlotz, Akshay Chaudhari
- Abstract summary: We develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays.
We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts.
We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images.
- Score: 7.618389245539657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal models trained on large natural image-text pair datasets have
exhibited astounding abilities in generating high-quality images. Medical
imaging data is fundamentally different to natural images, and the language
used to succinctly capture relevant details in medical data uses a different,
narrow but semantically rich, domain-specific vocabulary. Not surprisingly,
multi-modal models trained on natural image-text pairs do not tend to
generalize well to the medical domain. Developing generative imaging models
faithfully representing medical concepts while providing compositional
diversity could mitigate the existing paucity of high-quality, annotated
medical imaging datasets. In this work, we develop a strategy to overcome the
large natural-medical distributional shift by adapting a pre-trained latent
diffusion model on a corpus of publicly available chest x-rays (CXR) and their
corresponding radiology (text) reports. We investigate the model's ability to
generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We
assess the model outputs quantitatively using image quality metrics, and
evaluate image quality and text-image alignment by human domain experts. We
present evidence that the resulting model (RoentGen) is able to create visually
convincing, diverse synthetic CXR images, and that the output can be controlled
to a new extent by using free-form text prompts including radiology-specific
language. Fine-tuning this model on a fixed training set and using it as a data
augmentation method, we measure a 5% improvement of a classifier trained
jointly on synthetic and real images, and a 3% improvement when trained on a
larger but purely synthetic training set. Finally, we observe that this
fine-tuning distills in-domain knowledge in the text-encoder and can improve
its representation capabilities of certain diseases like pneumothorax by 25%.
Related papers
- Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis [22.0080610434872]
We propose a controlled generation framework for synthetic images with annotations.
We show that our approach can produce annotated lung CT images that can faithfully represent anatomy.
Our experiments demonstrate that controlled generative frameworks of this nature can surpass nearly every state-of-the-art image generative model.
arXiv Detail & Related papers (2024-09-07T01:19:02Z) - A Domain Translation Framework with an Adversarial Denoising Diffusion
Model to Generate Synthetic Datasets of Echocardiography Images [0.5999777817331317]
We introduce a framework to create echocardiography images suitable to be used for clinical research purposes.
For several domain translation operations, the results verified that such generative model was able to synthesize high quality image samples.
arXiv Detail & Related papers (2024-03-07T15:58:03Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Trade-offs in Fine-tuned Diffusion Models Between Accuracy and
Interpretability [5.865936619867771]
We unravel a consequential trade-off between image fidelity as gauged by conventional metrics and model interpretability in generative diffusion models.
We present a set of design principles for the development of truly interpretable generative models.
arXiv Detail & Related papers (2023-03-31T09:11:26Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains [3.8137985834223502]
Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
arXiv Detail & Related papers (2022-10-09T01:43:08Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.