EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision
- URL: http://arxiv.org/abs/2007.05597v2
- Date: Fri, 15 Jan 2021 19:07:26 GMT
- Title: EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision
- Authors: Siddharth Biswal, Peiye Zhuang, Ayis Pyrros, Nasir Siddiqui, Sanmi
Koyejo, Jimeng Sun
- Abstract summary: We propose an End-to-end MultImodal X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and corresponding free-text reports.
EMIXER is an conditional generative adversarial model by 1) generating an image based on a label, 2) encoding the image to a hidden embedding, 3) producing the corresponding text via a hierarchical decoder from the image embedding, and 4) a joint discriminator for assessing both the image and the corresponding text.
We show that EMIXER generated synthetic datasets can augment X-ray image classification, report generation models to achieve 5.
- Score: 39.07263052525579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have enabled the automated synthesis of high-quality
data for diverse applications. However, the most effective generative models
are specialized to data from a single domain (e.g., images or text). Real-world
applications such as healthcare require multi-modal data from multiple domains
(e.g., both images and corresponding text), which are difficult to acquire due
to limited availability and privacy concerns and are much harder to synthesize.
To tackle this joint synthesis challenge, we propose an End-to-end MultImodal
X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and
corresponding free-text reports, all conditional on diagnosis labels. EMIXER is
an conditional generative adversarial model by 1) generating an image based on
a label, 2) encoding the image to a hidden embedding, 3) producing the
corresponding text via a hierarchical decoder from the image embedding, and 4)
a joint discriminator for assessing both the image and the corresponding text.
EMIXER also enables self-supervision to leverage vast amount of unlabeled data.
Extensive experiments with real X-ray reports data illustrate how data
augmentation using synthesized multimodal samples can improve the performance
of a variety of supervised tasks including COVID-19 X-ray classification with
very limited samples. The quality of generated images and reports are also
confirmed by radiologists. We quantitatively show that EMIXER generated
synthetic datasets can augment X-ray image classification, report generation
models to achieve 5.94% and 6.9% improvement on models trained only on real
data samples. Taken together, our results highlight the promise of state of
generative models to advance clinical machine learning.
Related papers
- A Domain Translation Framework with an Adversarial Denoising Diffusion
Model to Generate Synthetic Datasets of Echocardiography Images [0.5999777817331317]
We introduce a framework to create echocardiography images suitable to be used for clinical research purposes.
For several domain translation operations, the results verified that such generative model was able to synthesize high quality image samples.
arXiv Detail & Related papers (2024-03-07T15:58:03Z) - DDPM based X-ray Image Synthesizer [0.0]
We propose a Denoising Diffusion Probabilistic Model (DDPM) combined with a UNet architecture for X-ray image synthesis.
Our methodology employs over 3000 pneumonia X-ray images obtained from Kaggle for training.
Results demonstrate the effectiveness of our approach, as the model successfully generated realistic images with low Mean Squared Error (MSE)
arXiv Detail & Related papers (2024-01-03T04:35:58Z) - EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided
Diffusion Model [4.057796755073023]
We develop controllable diffusion models for medical image synthesis, called EMIT-Diff.
We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data.
In our approach, we ensure that the synthesized samples adhere to medically relevant constraints.
arXiv Detail & Related papers (2023-10-19T16:18:02Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - An Attentive-based Generative Model for Medical Image Synthesis [18.94900480135376]
We propose an attention-based dual contrast generative model, called ADC-cycleGAN, which can synthesize medical images from unpaired data with multiple slices.
The model integrates a dual contrast loss term with the CycleGAN loss to ensure that the synthesized images are distinguishable from the source domain.
Experimental results demonstrate that the proposed ADC-cycleGAN model produces comparable samples to other state-of-the-art generative models.
arXiv Detail & Related papers (2023-06-02T14:17:37Z) - Mask-conditioned latent diffusion for generating gastrointestinal polyp
images [2.027538200191349]
This study proposes a conditional DPM framework to generate synthetic GI polyp images conditioned on given segmentation masks.
Our system can generate an unlimited number of high-fidelity synthetic polyp images with the corresponding ground truth masks of polyps.
Results show that the best micro-imagewise IOU of 0.7751 was achieved from DeepLabv3+ when the training data consists of both real data and synthetic data.
arXiv Detail & Related papers (2023-04-11T14:11:17Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.