MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided
Diffusion with Visual Invariant
- URL: http://arxiv.org/abs/2403.04290v1
- Date: Thu, 7 Mar 2024 07:39:00 GMT
- Title: MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided
Diffusion with Visual Invariant
- Authors: Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu
- Abstract summary: MedM2G is a medical generative model that unifies medical generation tasks of text-to-image, image-to-text, and unified generation of medical modalities.
It performs 5 medical generation tasks across 10 datasets, consistently outperforming various state-of-the-art works.
- Score: 15.30998544228763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical generative models, acknowledged for their high-quality sample
generation ability, have accelerated the fast growth of medical applications.
However, recent works concentrate on separate medical generation models for
distinct medical tasks and are restricted to inadequate medical multi-modal
knowledge, constraining medical comprehensive diagnosis. In this paper, we
propose MedM2G, a Medical Multi-Modal Generative framework, with the key
innovation to align, extract, and generate medical multi-modal within a unified
model. Extending beyond single or two medical modalities, we efficiently align
medical multi-modal through the central alignment approach in the unified
space. Significantly, our framework extracts valuable clinical knowledge by
preserving the medical visual invariant of each imaging modal, thereby
enhancing specific medical information for multi-modal generation. By
conditioning the adaptive cross-guided parameters into the multi-flow diffusion
framework, our model promotes flexible interactions among medical multi-modal
for generation. MedM2G is the first medical generative model that unifies
medical generation tasks of text-to-image, image-to-text, and unified
generation of medical modalities (CT, MRI, X-ray). It performs 5 medical
generation tasks across 10 datasets, consistently outperforming various
state-of-the-art works.
Related papers
- MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation [40.9095393430871]
We introduce MedViLaM, a unified vision-language model towards a generalist model for medical data.
MedViLaM can flexibly encode and interpret various forms of medical data, including clinical language and imaging.
We present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.
arXiv Detail & Related papers (2024-09-29T12:23:10Z) - MultiMed: Massively Multimodal and Multitask Medical Understanding [41.160488390597905]
MultiMed is a benchmark designed to evaluate and enable large-scale learning across a wide spectrum of medical modalities and tasks.
It consists of 2.56 million samples across ten medical modalities such as medical reports, pathology, genomics, and protein data.
Using MultiMed, we conduct comprehensive experiments benchmarking state-of-the-art unimodal, multimodal, and multitask models.
arXiv Detail & Related papers (2024-08-22T18:41:36Z) - Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning.
AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv Detail & Related papers (2024-07-25T17:46:38Z) - Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images [1.4680035572775536]
Vision-language models have emerged as a powerful tool for challenging multi-modal classification problem in the medical domain.
Existing research has focused on clinical descriptions for specific modalities or body regions, leaving a gap for a model providing entire-body multi-modal descriptions.
In this paper, we address this gap by automating the generation of standardized body station(s) and list of organ(s) across the whole body in multi-modal MR and CT radiological images.
arXiv Detail & Related papers (2024-05-31T09:59:11Z) - Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models [17.643421997037514]
We propose a novel framework that tackles both discriminative and generative multimodal medical tasks.
The learning of Med-MoE consists of three steps: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning.
Our model can achieve performance superior to or on par with state-of-the-art baselines.
arXiv Detail & Related papers (2024-04-16T02:35:17Z) - Med-Flamingo: a Multimodal Medical Few-shot Learner [58.85676013818811]
We propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain.
Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks.
We conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app.
arXiv Detail & Related papers (2023-07-27T20:36:02Z) - Towards Generalist Biomedical AI [28.68106423175678]
We introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system.
Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data.
We conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales.
arXiv Detail & Related papers (2023-07-26T17:52:22Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.