PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion
- URL: http://arxiv.org/abs/2503.00196v1
- Date: Fri, 28 Feb 2025 21:32:08 GMT
- Title: PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion
- Authors: Amar Kumar, Anita Kriz, Mohammad Havaei, Tal Arbel,
- Abstract summary: Development of reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, data imbalances, and limited text annotations in datasets.<n>We present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals.
- Score: 5.504796147401483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing reliable and generalizable deep learning systems for medical imaging faces significant obstacles due to spurious correlations, data imbalances, and limited text annotations in datasets. Addressing these challenges requires architectures robust to the unique complexities posed by medical imaging data. The rapid advancements in vision-language foundation models within the natural image domain prompt the question of how they can be adapted for medical imaging tasks. In this work, we present PRISM, a framework that leverages foundation models to generate high-resolution, language-guided medical image counterfactuals using Stable Diffusion. Our approach demonstrates unprecedented precision in selectively modifying spurious correlations (the medical devices) and disease features, enabling the removal and addition of specific attributes while preserving other image characteristics. Through extensive evaluation, we show how PRISM advances counterfactual generation and enables the development of more robust downstream classifiers for clinically deployable solutions. To facilitate broader adoption and research, we make our code publicly available at https://github.com/Amarkr1/PRISM.
Related papers
- Causal Disentanglement for Robust Long-tail Medical Image Generation [80.15257897500578]
We propose a novel medical image generation framework, which generates independent pathological and structural features.
We leverage a diffusion model guided by pathological findings to model pathological features, enabling the generation of diverse counterfactual images.
arXiv Detail & Related papers (2025-04-20T01:54:18Z) - Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation [0.8397730500554048]
We present the first investigation of the power of pre-trained vision-language foundation models, once fine-tuned on medical image datasets, to perform latent disentanglement.
We demonstrate that language-guided Stable Diffusion inherently learns to factorize key attributes for image generation.
We devise a framework to identify, isolate, and manipulate key attributes through latent space trajectory of generative models, facilitating precise control over medical image synthesis.
arXiv Detail & Related papers (2025-03-30T23:15:52Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models.<n>Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Efficient MedSAMs: Segment Anything in Medical Images on Laptop [69.28565867103542]
We organized the first international competition dedicated to promptable medical image segmentation.<n>The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline.<n>The best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption.
arXiv Detail & Related papers (2024-12-20T17:33:35Z) - MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions [0.13108652488669734]
integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness.
We create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities.
arXiv Detail & Related papers (2024-06-25T13:20:39Z) - Improving Medical Report Generation with Adapter Tuning and Knowledge
Enhancement in Vision-Language Foundation Models [26.146579369491718]
This study builds upon the state-of-the-art vision-language pre-training and fine-tuning approach, BLIP-2, to customize general large-scale foundation models.
Validation on the dataset of ImageCLEFmedical 2023 demonstrates our model's prowess, achieving the best-averaged results against several state-of-the-art methods.
arXiv Detail & Related papers (2023-12-07T01:01:45Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Adapting Pretrained Vision-Language Foundational Models to Medical
Imaging Domains [3.8137985834223502]
Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets.
We explore the sub-components of the Stable Diffusion pipeline to fine-tune the model to generate medical images.
Our best-performing model improves upon the stable diffusion baseline and can be conditioned to insert a realistic-looking abnormality on a synthetic radiology image.
arXiv Detail & Related papers (2022-10-09T01:43:08Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.