Related papers: Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

URL: http://arxiv.org/abs/2309.15529v1
Date: Wed, 27 Sep 2023 09:46:07 GMT
Title: Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data
Authors: Muyu Wang, Shiyu Fan, Yichen Li, Hui Chen
Abstract summary: Fusing multi-modal data can improve the performance of deep learning models. Missing modalities are common for medical data due to patients' specificity. This study developed an efficient multi-modal fusion architecture for medical data that was robust to missing modalities.
Score: 8.472576865966744
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance on disease diagnosis.X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve model's robustness to missing modalities in the inference process. Finally, we designed comparison and ablation experiments for validating the effectiveness of the fusion, the robustness to missing modalities and the enhancements from each key component. Experiments were conducted on MIMIC-IV, MIMIC-CXR with the 14-label disease diagnosis task. Areas under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC) were used to evaluate models' performance. The experimental results demonstrated that our proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This method is hopeful to be scaled to more modalities to enhance the clinical practicality of the model.

Related papers

MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach. MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student. We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z)
TMI-CLNet: Triple-Modal Interaction Network for Chronic Liver Disease Prognosis From Imaging, Clinical, and Radiomic Data Fusion [3.2805467531625556]
Chronic liver disease represents a significant health challenge worldwide. Recent evidence suggests that integrating multimodal data can provide more comprehensive prognostic information. We present the Triple-Modal Interaction Chronic Liver Network (TMI-CLNet)
arXiv Detail & Related papers (2025-02-02T07:05:28Z)
ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction [12.893857146169045]
Alzheimer's disease (AD) is a common neurodegenerative disease among the elderly. Early prediction and timely intervention of its prodromal stage, mild cognitive impairment (MCI), can decrease the risk of advancing to AD.
arXiv Detail & Related papers (2025-01-20T05:12:31Z)
RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR. Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer. Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z)
Cross-conditioned Diffusion Model for Medical Image to Image Translation [22.020931436223204]
We introduce a Cross-conditioned Diffusion Model (CDM) for medical image-to-image translation. First, we propose a Modality-specific Representation Model (MRM) to model the distribution of target modalities. Then, we design a Modality-decoupled Diffusion Network (MDN) to efficiently and effectively learn the distribution from MRM.
arXiv Detail & Related papers (2024-09-13T02:48:56Z)
MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality [29.088860237497165]
We introduce Modality-aware Low-Rank Adaptation (MoRA) for multi-modal pre-trained models. MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires less than 1.6% of the trainable parameters needed compared to training the entire model.
arXiv Detail & Related papers (2024-08-17T01:40:00Z)
Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z)
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency [18.291267748113142]
We propose DrFuse to achieve effective clinical multi-modal fusion. We address the missing modality issue by disentangling the features shared across modalities and those unique within each modality. We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR.
arXiv Detail & Related papers (2024-03-10T12:41:34Z)
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning. We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z)
The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation. We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z)
Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition. We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model. HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z)
Feature robustness and sex differences in medical imaging: a case study in MRI-based Alzheimer's disease detection [1.7616042687330637]
We compare two classification schemes on the ADNI MRI dataset. We do not find a strong dependence of model performance for male and female test subjects on the sex composition of the training dataset.
arXiv Detail & Related papers (2022-04-04T17:37:54Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.