BiomedJourney: Counterfactual Biomedical Image Generation by
Instruction-Learning from Multimodal Patient Journeys
- URL: http://arxiv.org/abs/2310.10765v3
- Date: Sat, 21 Oct 2023 02:59:47 GMT
- Title: BiomedJourney: Counterfactual Biomedical Image Generation by
Instruction-Learning from Multimodal Patient Journeys
- Authors: Yu Gu, Jianwei Yang, Naoto Usuyama, Chunyuan Li, Sheng Zhang, Matthew
P. Lungren, Jianfeng Gao, Hoifung Poon
- Abstract summary: We present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning.
We use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression.
The resulting triples are then used to train a latent diffusion model for counterfactual biomedical image generation.
- Score: 99.7082441544384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rapid progress has been made in instruction-learning for image editing with
natural-language instruction, as exemplified by InstructPix2Pix. In
biomedicine, such methods can be applied to counterfactual image generation,
which helps differentiate causal structure from spurious correlation and
facilitate robust image interpretation for disease progression modeling.
However, generic image-editing models are ill-suited for the biomedical domain,
and counterfactual biomedical image generation is largely underexplored. In
this paper, we present BiomedJourney, a novel method for counterfactual
biomedical image generation by instruction-learning from multimodal patient
journeys. Given a patient with two biomedical images taken at different time
points, we use GPT-4 to process the corresponding imaging reports and generate
a natural language description of disease progression. The resulting triples
(prior image, progression description, new image) are then used to train a
latent diffusion model for counterfactual biomedical image generation. Given
the relative scarcity of image time series data, we introduce a two-stage
curriculum that first pretrains the denoising network using the much more
abundant single image-report pairs (with dummy prior image), and then continues
training using the counterfactual triples. Experiments using the standard
MIMIC-CXR dataset demonstrate the promise of our method. In a comprehensive
battery of tests on counterfactual medical image generation, BiomedJourney
substantially outperforms prior state-of-the-art methods in instruction image
editing and medical image generation such as InstructPix2Pix and RoentGen. To
facilitate future study in counterfactual medical generation, we plan to
release our instruction-learning code and pretrained models.
Related papers
- MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging [16.325123491357203]
We propose a multimodal pre-training framework that jointly incorporates genomics and medical images for downstream tasks.
We align medical images and genes using a self-supervised contrastive learning approach which combines the Mamba as a genetic encoder and the Vision Transformer (ViT) as a medical image encoder.
arXiv Detail & Related papers (2024-06-02T06:20:45Z) - Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling.
For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views.
For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - BiomedCLIP: a multimodal biomedical foundation model pretrained from
fifteen million scientific image-text pairs [48.376109878173956]
We present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets.
PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles.
Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing.
arXiv Detail & Related papers (2023-03-02T02:20:04Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - MMV_Im2Im: An Open Source Microscopy Machine Vision Toolbox for
Image-to-Image Transformation [0.571097144710995]
We introduce a new open source python package MMV_Im2Im for image-to-image transformation in bioimaging applications.
The overall package is designed with a generic image-to-image transformation framework.
We demonstrate the effectiveness of MMV_Im2Im in more than ten different biomedical problems.
arXiv Detail & Related papers (2022-09-06T13:42:17Z) - Metadata-enhanced contrastive learning from retinal optical coherence tomography images [7.932410831191909]
We extend conventional contrastive frameworks with a novel metadata-enhanced strategy.
Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships.
Our approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks.
arXiv Detail & Related papers (2022-08-04T08:53:15Z) - A Survey on Training Challenges in Generative Adversarial Networks for
Biomedical Image Analysis [0.6308539010172307]
Generative Adversarial Networks (GANs) have been widely utilized to address data limitations through the generation of synthetic biomedical images.
GANs can experience several technical challenges that impede the generation of suitable synthetic imagery.
This work presents a review and taxonomy based on solutions to the training problems of GANs in the biomedical imaging domain.
arXiv Detail & Related papers (2022-01-19T15:23:46Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.