VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation
- URL: http://arxiv.org/abs/2506.23641v1
- Date: Mon, 30 Jun 2025 09:11:19 GMT
- Title: VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation
- Authors: Peng Huang, Junhu Fu, Bowen Guo, Zeju Li, Yuanyuan Wang, Yi Guo,
- Abstract summary: generative models require rich attribute information beyond labels to produce realistic and diverse images.<n>We propose Visual Attribute Prompts (VAP)-Diffusion to leverage external knowledge from pre-trained Multi-modal Large Language Models (MLLMs) to improve the quality and diversity of medical image generation.<n>Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.
- Score: 7.000306937114167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the appearance of medical images is influenced by multiple underlying factors, generative models require rich attribute information beyond labels to produce realistic and diverse images. For instance, generating an image of skin lesion with specific patterns demands descriptions that go beyond diagnosis, such as shape, size, texture, and color. However, such detailed descriptions are not always accessible. To address this, we explore a framework, termed Visual Attribute Prompts (VAP)-Diffusion, to leverage external knowledge from pre-trained Multi-modal Large Language Models (MLLMs) to improve the quality and diversity of medical image generation. First, to derive descriptions from MLLMs without hallucination, we design a series of prompts following Chain-of-Thoughts for common medical imaging tasks, including dermatologic, colorectal, and chest X-ray images. Generated descriptions are utilized during training and stored across different categories. During testing, descriptions are randomly retrieved from the corresponding category for inference. Moreover, to make the generator robust to unseen combination of descriptions at the test time, we propose a Prototype Condition Mechanism that restricts test embeddings to be similar to those from training. Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.
Related papers
- Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z) - RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z) - MedFILIP: Medical Fine-grained Language-Image Pre-training [11.894318326422054]
Existing methods struggle to accurately characterize associations between images and diseases.<n>MedFILIP introduces medical image-specific knowledge through contrastive learning.<n>For single-label, multi-label, and fine-grained classification, our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-01-18T14:08:33Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image.
We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models.
The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z) - EAFP-Med: An Efficient Adaptive Feature Processing Module Based on
Prompts for Medical Image Detection [27.783012550610387]
Cross-domain adaptive medical image detection is challenging due to the differences in lesion representations across various medical imaging technologies.
We propose EAFP-Med, an efficient adaptive feature processing module based on prompts for medical image detection.
EAFP-Med can efficiently extract lesion features from various medical images based on prompts, enhancing the model's performance.
arXiv Detail & Related papers (2023-11-27T05:10:15Z) - Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning
for Medical Image Captioning [12.10183458424711]
We present a novel medical image captioning method guided by the segment anything model (SAM)
Our approach employs a distinctive pre-training strategy with mixed semantic learning to simultaneously capture both the overall information and finer details within medical images.
arXiv Detail & Related papers (2023-11-02T05:44:13Z) - BiomedJourney: Counterfactual Biomedical Image Generation by
Instruction-Learning from Multimodal Patient Journeys [99.7082441544384]
We present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning.
We use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression.
The resulting triples are then used to train a latent diffusion model for counterfactual biomedical image generation.
arXiv Detail & Related papers (2023-10-16T18:59:31Z) - Few-shot medical image classification with simple shape and texture text
descriptors using vision-language models [1.1172382217477128]
We investigate the usefulness of vision-language models (VLMs) and large language models for binary few-shot classification of medical images.
We utilize the GPT-4 model to generate text descriptors that encapsulate the shape and texture characteristics of objects in medical images.
arXiv Detail & Related papers (2023-08-08T02:48:46Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.