Medical diffusion on a budget: textual inversion for medical image
generation
- URL: http://arxiv.org/abs/2303.13430v1
- Date: Thu, 23 Mar 2023 16:50:19 GMT
- Title: Medical diffusion on a budget: textual inversion for medical image
generation
- Authors: Bram de Wilde, Anindo Saha, Richard P.G. ten Broek, Henkjan Huisman
- Abstract summary: Diffusion-based models for text-to-image generation have gained immense popularity.
Training them from scratch still requires access to large datasets and significant computational resources.
This work demonstrates that pre-trained Stable Diffusion models can be adapted to various medical imaging modalities by training text embeddings with textual inversion.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based models for text-to-image generation have gained immense
popularity due to recent advancements in efficiency, accessibility, and
quality. Although it is becoming increasingly feasible to perform inference
with these systems using consumer-grade GPUs, training them from scratch still
requires access to large datasets and significant computational resources. In
the case of medical image generation, the availability of large, publicly
accessible datasets that include text reports is limited due to legal and
ethical concerns. While training a diffusion model on a private dataset may
address this issue, it is not always feasible for institutions lacking the
necessary computational resources. This work demonstrates that pre-trained
Stable Diffusion models, originally trained on natural images, can be adapted
to various medical imaging modalities by training text embeddings with textual
inversion. In this study, we conducted experiments using medical datasets
comprising only 100 samples from three medical modalities. Embeddings were
trained in a matter of hours, while still retaining diagnostic relevance in
image generation. Experiments were designed to achieve several objectives.
Firstly, we fine-tuned the training and inference processes of textual
inversion, revealing that larger embeddings and more examples are required.
Secondly, we validated our approach by demonstrating a 2\% increase in the
diagnostic accuracy (AUC) for detecting prostate cancer on MRI, which is a
challenging multi-modal imaging modality, from 0.78 to 0.80. Thirdly, we
performed simulations by interpolating between healthy and diseased states,
combining multiple pathologies, and inpainting to show embedding flexibility
and control of disease appearance. Finally, the embeddings trained in this
study are small (less than 1 MB), which facilitates easy sharing of medical
data with reduced privacy concerns.
Related papers
- Multi-Conditioned Denoising Diffusion Probabilistic Model (mDDPM) for Medical Image Synthesis [22.0080610434872]
We propose a controlled generation framework for synthetic images with annotations.
We show that our approach can produce annotated lung CT images that can faithfully represent anatomy.
Our experiments demonstrate that controlled generative frameworks of this nature can surpass nearly every state-of-the-art image generative model.
arXiv Detail & Related papers (2024-09-07T01:19:02Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided
Diffusion Model [4.057796755073023]
We develop controllable diffusion models for medical image synthesis, called EMIT-Diff.
We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data.
In our approach, we ensure that the synthesized samples adhere to medically relevant constraints.
arXiv Detail & Related papers (2023-10-19T16:18:02Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning
Framework in Classification of Medical Images on Limited Data: A COVID-19
Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources.
CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z) - Self-supervised Learning from 100 Million Medical Images [13.958840691105992]
We propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering.
We leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography.
We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR.
arXiv Detail & Related papers (2022-01-04T18:27:04Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.