Explainable Deep Neural Network for Multimodal ECG Signals: Intermediate vs Late Fusion
- URL: http://arxiv.org/abs/2508.11666v2
- Date: Sun, 12 Oct 2025 12:15:07 GMT
- Title: Explainable Deep Neural Network for Multimodal ECG Signals: Intermediate vs Late Fusion
- Authors: Timothy Oladunni, Ehimen Aneni,
- Abstract summary: Multimodal deep neural networks (MDNN) have the capability of integrating diverse data domains and offer a promising solution for robust and accurate predictions.<n>This study investigates the comparative effectiveness of intermediate and late fusion strategies using ECG signals.
- Score: 1.1344265020822928
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The limitations of unimodal deep learning models, particularly their tendency to overfit and limited generalizability, have renewed interest in multimodal fusion strategies. Multimodal deep neural networks (MDNN) have the capability of integrating diverse data domains and offer a promising solution for robust and accurate predictions. However, the optimal fusion strategy, intermediate fusion (feature-level) versus late fusion (decision-level) remains insufficiently examined, especially in high-stakes clinical contexts such as ECG-based cardiovascular disease (CVD) classification. This study investigates the comparative effectiveness of intermediate and late fusion strategies using ECG signals across three domains: time, frequency, and time-frequency. A series of experiments were conducted to identify the highest-performing fusion architecture. Results demonstrate that intermediate fusion consistently outperformed late fusion, achieving a peak accuracy of 97 percent, with Cohen's d > 0.8 relative to standalone models and d = 0.40 compared to late fusion. Interpretability analyses using saliency maps reveal that both models align with the discretized ECG signals. Statistical dependency between the discretized ECG signals and corresponding saliency maps for each class was confirmed using Mutual Information (MI). The proposed ECG domain-based multimodal model offers superior predictive capability and enhanced explainability, crucial attributes in medical AI applications, surpassing state-of-the-art models.
Related papers
- Deep Neural Network Architectures for Electrocardiogram Classification: A Comprehensive Evaluation [7.708113178862228]
This study presents a comprehensive evaluation of deep neural network architectures for automated arrhythmia classification.<n>To address data scarcity in minority classes, the MIT-BIH Arrhythmia dataset was augmented using a Generative Adversarial Network (GAN)<n>We developed and compared four distinct architectures, including Convolutional Neural Networks (CNN), CNN combined with Long Short-Term Memory (CNN-LSTM), CNN-LSTM with Attention, and 1D Residual Networks (ResNet-1D)
arXiv Detail & Related papers (2026-02-07T06:56:50Z) - MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization [46.65200216642429]
We introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs.<n>Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10%.
arXiv Detail & Related papers (2026-02-01T07:56:10Z) - Uncertainty-aware Cross-training for Semi-supervised Medical Image Segmentation [45.96892342675963]
We propose an Uncertainty-aware Cross-training framework for semi-supervised medical image (UC-Seg)<n>Our method achieves superior segmentation accuracy and generalization performance compared to other state-of-the-art semi-supervised methods.
arXiv Detail & Related papers (2025-08-12T15:28:10Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Rethinking Multimodality: Optimizing Multimodal Deep Learning for Biomedical Signal Classification [5.811275732167591]
This study proposes a novel perspective on multimodal deep learning for biomedical signal classification.<n>We systematically analyze how complementary feature domains impact model performance.<n>We demonstrate that optimal domain fusion isn't about the number of modalities, but the quality of their inherent complementarity.
arXiv Detail & Related papers (2025-08-01T14:12:10Z) - Latent Space Data Fusion Outperforms Early Fusion in Multimodal Mental Health Digital Phenotyping Data [0.0]
Mental illnesses such as depression and anxiety require improved methods for early detection and personalized intervention.<n>Traditional predictive models often rely on unimodal data or early fusion strategies that fail to capture the complex, multimodal nature of psychiatric data.<n>We evaluated intermediate (latent space) fusion for predicting daily depressive symptoms.
arXiv Detail & Related papers (2025-07-10T18:10:46Z) - Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology [6.418265127069878]
We propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions.<n>This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.
arXiv Detail & Related papers (2024-11-26T13:25:53Z) - A Two-Stage Generative Model with CycleGAN and Joint Diffusion for
MRI-based Brain Tumor Detection [41.454028276986946]
We propose a novel framework Two-Stage Generative Model (TSGM) to improve brain tumor detection and segmentation.
CycleGAN is trained on unpaired data to generate abnormal images from healthy images as data prior.
VE-JP is implemented to reconstruct healthy images using synthetic paired abnormal images as a guide.
arXiv Detail & Related papers (2023-11-06T12:58:26Z) - Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis.
The hierarchical transformers in the generator are designed to estimate the noise at multiple scales.
Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z) - Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase
Classification Using EEG [56.155331323304]
Deep learning based electroencephalogram channels' feature level fusion is carried out in this work.
Channel selection, fusion, and classification procedures were optimized by two optimization algorithms.
arXiv Detail & Related papers (2021-12-18T14:17:49Z) - ECG Heartbeat Classification Using Multimodal Fusion [13.524306011331303]
We propose two computationally efficient multimodal fusion frameworks for ECG heart beat classification.
In MFF, we extracted features from penultimate layer of CNNs and fused them to get unique and interdependent information.
We achieved classification accuracy of 99.7% and 99.2% on arrhythmia and MI classification, respectively.
arXiv Detail & Related papers (2021-07-21T03:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.