EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision
- URL: http://arxiv.org/abs/2007.05597v2
- Date: Fri, 15 Jan 2021 19:07:26 GMT
- Title: EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision
- Authors: Siddharth Biswal, Peiye Zhuang, Ayis Pyrros, Nasir Siddiqui, Sanmi
Koyejo, Jimeng Sun
- Abstract summary: We propose an End-to-end MultImodal X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and corresponding free-text reports.
EMIXER is an conditional generative adversarial model by 1) generating an image based on a label, 2) encoding the image to a hidden embedding, 3) producing the corresponding text via a hierarchical decoder from the image embedding, and 4) a joint discriminator for assessing both the image and the corresponding text.
We show that EMIXER generated synthetic datasets can augment X-ray image classification, report generation models to achieve 5.
- Score: 39.07263052525579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have enabled the automated synthesis of high-quality
data for diverse applications. However, the most effective generative models
are specialized to data from a single domain (e.g., images or text). Real-world
applications such as healthcare require multi-modal data from multiple domains
(e.g., both images and corresponding text), which are difficult to acquire due
to limited availability and privacy concerns and are much harder to synthesize.
To tackle this joint synthesis challenge, we propose an End-to-end MultImodal
X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and
corresponding free-text reports, all conditional on diagnosis labels. EMIXER is
an conditional generative adversarial model by 1) generating an image based on
a label, 2) encoding the image to a hidden embedding, 3) producing the
corresponding text via a hierarchical decoder from the image embedding, and 4)
a joint discriminator for assessing both the image and the corresponding text.
EMIXER also enables self-supervision to leverage vast amount of unlabeled data.
Extensive experiments with real X-ray reports data illustrate how data
augmentation using synthesized multimodal samples can improve the performance
of a variety of supervised tasks including COVID-19 X-ray classification with
very limited samples. The quality of generated images and reports are also
confirmed by radiologists. We quantitatively show that EMIXER generated
synthetic datasets can augment X-ray image classification, report generation
models to achieve 5.94% and 6.9% improvement on models trained only on real
data samples. Taken together, our results highlight the promise of state of
generative models to advance clinical machine learning.
Related papers
- MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities [59.61465292965639]
This paper investigates a new paradigm for leveraging generative models in medical applications.
We propose a diffusion-based data engine, termed MRGen, which enables generation conditioned on text prompts and masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models [49.439311430360284]
We introduce a novel data synthesis method inspired by contrastive learning and image difference captioning.
Our key idea involves challenging the model to discern both matching and distinct elements.
We leverage this generated dataset to fine-tune state-of-the-art (SOTA) MLLMs.
arXiv Detail & Related papers (2024-08-08T17:10:16Z) - A Domain Translation Framework with an Adversarial Denoising Diffusion
Model to Generate Synthetic Datasets of Echocardiography Images [0.5999777817331317]
We introduce a framework to create echocardiography images suitable to be used for clinical research purposes.
For several domain translation operations, the results verified that such generative model was able to synthesize high quality image samples.
arXiv Detail & Related papers (2024-03-07T15:58:03Z) - DDPM based X-ray Image Synthesizer [0.0]
We propose a Denoising Diffusion Probabilistic Model (DDPM) combined with a UNet architecture for X-ray image synthesis.
Our methodology employs over 3000 pneumonia X-ray images obtained from Kaggle for training.
Results demonstrate the effectiveness of our approach, as the model successfully generated realistic images with low Mean Squared Error (MSE)
arXiv Detail & Related papers (2024-01-03T04:35:58Z) - DiffBoost: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model [3.890243179348094]
Large-scale, big-variant, high-quality data are crucial for developing robust and successful deep-learning models for medical applications.
This paper proposes a novel approach by developing controllable diffusion models for medical image synthesis, called DiffBoost.
We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data.
arXiv Detail & Related papers (2023-10-19T16:18:02Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - An Attentive-based Generative Model for Medical Image Synthesis [18.94900480135376]
We propose an attention-based dual contrast generative model, called ADC-cycleGAN, which can synthesize medical images from unpaired data with multiple slices.
The model integrates a dual contrast loss term with the CycleGAN loss to ensure that the synthesized images are distinguishable from the source domain.
Experimental results demonstrate that the proposed ADC-cycleGAN model produces comparable samples to other state-of-the-art generative models.
arXiv Detail & Related papers (2023-06-02T14:17:37Z) - Mask-conditioned latent diffusion for generating gastrointestinal polyp
images [2.027538200191349]
This study proposes a conditional DPM framework to generate synthetic GI polyp images conditioned on given segmentation masks.
Our system can generate an unlimited number of high-fidelity synthetic polyp images with the corresponding ground truth masks of polyps.
Results show that the best micro-imagewise IOU of 0.7751 was achieved from DeepLabv3+ when the training data consists of both real data and synthetic data.
arXiv Detail & Related papers (2023-04-11T14:11:17Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.