Missing-modality Enabled Multi-modal Fusion Architecture for Medical
Data
- URL: http://arxiv.org/abs/2309.15529v1
- Date: Wed, 27 Sep 2023 09:46:07 GMT
- Title: Missing-modality Enabled Multi-modal Fusion Architecture for Medical
Data
- Authors: Muyu Wang, Shiyu Fan, Yichen Li, Hui Chen
- Abstract summary: Fusing multi-modal data can improve the performance of deep learning models.
Missing modalities are common for medical data due to patients' specificity.
This study developed an efficient multi-modal fusion architecture for medical data that was robust to missing modalities.
- Score: 8.472576865966744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fusing multi-modal data can improve the performance of deep learning models.
However, missing modalities are common for medical data due to patients'
specificity, which is detrimental to the performance of multi-modal models in
applications. Therefore, it is critical to adapt the models to missing
modalities. This study aimed to develop an efficient multi-modal fusion
architecture for medical data that was robust to missing modalities and further
improved the performance on disease diagnosis.X-ray chest radiographs for the
image modality, radiology reports for the text modality, and structured value
data for the tabular data modality were fused in this study. Each modality pair
was fused with a Transformer-based bi-modal fusion module, and the three
bi-modal fusion modules were then combined into a tri-modal fusion framework.
Additionally, multivariate loss functions were introduced into the training
process to improve model's robustness to missing modalities in the inference
process. Finally, we designed comparison and ablation experiments for
validating the effectiveness of the fusion, the robustness to missing
modalities and the enhancements from each key component. Experiments were
conducted on MIMIC-IV, MIMIC-CXR with the 14-label disease diagnosis task.
Areas under the receiver operating characteristic curve (AUROC), the area under
the precision-recall curve (AUPRC) were used to evaluate models' performance.
The experimental results demonstrated that our proposed multi-modal fusion
architecture effectively fused three modalities and showed strong robustness to
missing modalities. This method is hopeful to be scaled to more modalities to
enhance the clinical practicality of the model.
Related papers
- RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - Cross-conditioned Diffusion Model for Medical Image to Image Translation [22.020931436223204]
We introduce a Cross-conditioned Diffusion Model (CDM) for medical image-to-image translation.
First, we propose a Modality-specific Representation Model (MRM) to model the distribution of target modalities.
Then, we design a Modality-decoupled Diffusion Network (MDN) to efficiently and effectively learn the distribution from MRM.
arXiv Detail & Related papers (2024-09-13T02:48:56Z) - MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality [29.088860237497165]
We introduce Modality-aware Low-Rank Adaptation (MoRA) for multi-modal pre-trained models.
MoRA integrates into the first block of the model, significantly improving performance when a modality is missing.
It requires less than 1.6% of the trainable parameters needed compared to training the entire model.
arXiv Detail & Related papers (2024-08-17T01:40:00Z) - Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - DrFuse: Learning Disentangled Representation for Clinical Multi-Modal
Fusion with Missing Modality and Modal Inconsistency [18.291267748113142]
We propose DrFuse to achieve effective clinical multi-modal fusion.
We address the missing modality issue by disentangling the features shared across modalities and those unique within each modality.
We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR.
arXiv Detail & Related papers (2024-03-10T12:41:34Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Feature robustness and sex differences in medical imaging: a case study
in MRI-based Alzheimer's disease detection [1.7616042687330637]
We compare two classification schemes on the ADNI MRI dataset.
We do not find a strong dependence of model performance for male and female test subjects on the sex composition of the training dataset.
arXiv Detail & Related papers (2022-04-04T17:37:54Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.