Related papers: Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation

Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation

URL: http://arxiv.org/abs/2506.10230v2
Date: Tue, 01 Jul 2025 16:27:24 GMT
Title: Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation
Authors: Emerson P. Grabke, Masoom A. Haider, Babak Taati,
Abstract summary: Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging.<n>We propose a novel LDM conditioning approach to address these limitations.<n>Our method achieves a 3D FID score of 0.025 on a size-limited 3D prostate MRI dataset.
Score: 1.6508709227918446
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Objective: Latent diffusion models (LDM) could alleviate data scarcity challenges affecting machine learning development for medical imaging. However, medical LDM strategies typically rely on short-prompt text encoders, non-medical LDMs, or large data volumes. These strategies can limit performance and scientific accessibility. We propose a novel LDM conditioning approach to address these limitations. Methods: We propose Class-Conditioned Efficient Large Language model Adapter (CCELLA), a novel dual-head conditioning approach that simultaneously conditions the LDM U-Net with free-text clinical reports and radiology classification. We also propose a data-efficient LDM framework centered around CCELLA and a proposed joint loss function. We first evaluate our method on 3D prostate MRI against state-of-the-art. We then augment a downstream classifier model training dataset with synthetic images from our method. Results: Our method achieves a 3D FID score of 0.025 on a size-limited 3D prostate MRI dataset, significantly outperforming a recent foundation model with FID 0.071. When training a classifier for prostate cancer prediction, adding synthetic images generated by our method during training improves classifier accuracy from 69% to 74%. Training a classifier solely on our method's synthetic images achieved comparable performance to training on real images alone. Conclusion: We show that our method improved both synthetic image quality and downstream classifier performance using limited data and minimal human annotation. Significance: The proposed CCELLA-centric framework enables radiology report and class-conditioned LDM training for high-quality medical image synthesis given limited data volume and human data annotation, improving LDM performance and scientific accessibility. Code from this study will be available at https://github.com/grabkeem/CCELLA

Related papers

CoCoLIT: ControlNet-Conditioned Latent Image Translation for MRI to Amyloid PET Synthesis [2.333160549379721]
High dimensionality and structural complexity of 3D neuroimaging data pose challenges for MRI-to-PET translation.<n>We present CoCoLIT, a diffusion-based latent generative framework that incorporates three main innovations.<n>We evaluate CoCoLIT's performance on publicly available datasets and find that our model significantly outperforms state-of-the-art methods on both image-based and amyloid-related metrics.
arXiv Detail & Related papers (2025-08-02T09:58:30Z)
Mitigating Multi-Sequence 3D Prostate MRI Data Scarcity through Domain Adaptation using Locally-Trained Latent Diffusion Models for Prostate Cancer Detection [1.6508709227918446]
Latent diffusion models (LDMs) could mitigate data scarcity challenges affecting machine learning development for medical image interpretation.<n>We propose CCELLA++ to address these limitations and improve clinical utility.
arXiv Detail & Related papers (2025-07-08T20:38:10Z)
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging [41.446379453352534]
Latent Diffusion Autoencoder (LDAE) is a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging.<n>This study focuses on Alzheimer disease (AD) using brain MR from the ADNI database as a case study.
arXiv Detail & Related papers (2025-04-11T15:37:46Z)
ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning [51.26601171361753]
We propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process.<n>We show that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance.
arXiv Detail & Related papers (2025-01-08T05:15:43Z)
MRI Reconstruction with Regularized 3D Diffusion Model (R3DM) [2.842800539489865]
We propose a 3D MRI reconstruction method that leverages a regularized 3D diffusion model combined with optimization method.<n>By incorporating diffusion based priors, our method improves image quality, reduces noise, and enhances the overall fidelity of 3D MRI reconstructions.
arXiv Detail & Related papers (2024-12-25T00:55:05Z)
Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting [0.0]
We propose and evaluate two local lesion generation approaches to address the challenge of augmenting small medical image datasets.<n>The first approach employs the Poisson Image Editing algorithm, a classical image processing technique, to create realistic image composites.<n>The second approach introduces a novel generative method, leveraging a fine-tuned Image Inpainting GAN to synthesize realistic lesions.
arXiv Detail & Related papers (2024-11-05T13:44:25Z)
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training. LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z)
Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.<n>This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z)
Cross-conditioned Diffusion Model for Medical Image to Image Translation [22.020931436223204]
We introduce a Cross-conditioned Diffusion Model (CDM) for medical image-to-image translation. First, we propose a Modality-specific Representation Model (MRM) to model the distribution of target modalities. Then, we design a Modality-decoupled Diffusion Network (MDN) to efficiently and effectively learn the distribution from MRM.
arXiv Detail & Related papers (2024-09-13T02:48:56Z)
ssVERDICT: Self-Supervised VERDICT-MRI for Enhanced Prostate Tumour Characterisation [2.755232740505053]
Self-supervised neural network for fitting VERDICT estimates parameter maps without training data. We compare the performance of ssVERDICT to two established baseline methods for fitting diffusion MRI models.
arXiv Detail & Related papers (2023-09-12T14:31:33Z)
ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment. The scarcity of annotated data limits the effectiveness and generalization of existing methods. We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z)
PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Our approach fuses image and textual data to enhance the generation process. We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Conditional Diffusion Models for Semantic 3D Brain MRI Synthesis [0.0]
We introduce Med-DDPM, a diffusion model designed for 3D semantic brain MRI synthesis. It effectively tackles data scarcity and privacy issues by integrating semantic conditioning. It generates diverse, coherent images with high visual fidelity.
arXiv Detail & Related papers (2023-05-29T04:14:38Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images. We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem. Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z)
Deep AUC Maximization for Medical Image Classification: Challenges and Opportunities [60.079782224958414]
We will present and discuss opportunities and challenges brought by a new deep learning method by AUC (aka underlinebf Deep underlinebf AUC classification)
arXiv Detail & Related papers (2021-11-01T15:31:32Z)
About Explicit Variance Minimization: Training Neural Networks for Medical Imaging With Limited Data Annotations [2.3204178451683264]
Variance Aware Training (VAT) method exploits this property by introducing the variance error into the model loss function. We validate VAT on three medical imaging datasets from diverse domains and various learning objectives.
arXiv Detail & Related papers (2021-05-28T21:34:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.