Related papers: Leveraging Foundational Models and Simple Fusion for Multi-modal Physiological Signal Analysis

Leveraging Foundational Models and Simple Fusion for Multi-modal Physiological Signal Analysis

URL: http://arxiv.org/abs/2512.15250v1
Date: Wed, 17 Dec 2025 09:49:06 GMT
Title: Leveraging Foundational Models and Simple Fusion for Multi-modal Physiological Signal Analysis
Authors: Youssef Ghallab, Omar Iraqy, Mohamed Kandil, Mohamed Ashraf, Saadeldine Eletter, Morougue Ghazal, Ayman Khalafallah, Nagwa El-Makky,
Abstract summary: We adapt the CBraMod encoder for large-scale self-supervised ECG pretraining.<n>We utilize a pre-trained CBraMod encoder for EEG and pre-train a symmetric ECG encoder.<n>Our approach achieves near state-of-the-art performance, demonstrating that carefully designed physiological encoders, even with straightforward fusion, substantially improve downstream performance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Physiological signals such as electrocardiograms (ECG) and electroencephalograms (EEG) provide complementary insights into human health and cognition, yet multi-modal integration is challenging due to limited multi-modal labeled data, and modality-specific differences . In this work, we adapt the CBraMod encoder for large-scale self-supervised ECG pretraining, introducing a dual-masking strategy to capture intra- and inter-lead dependencies. To overcome the above challenges, we utilize a pre-trained CBraMod encoder for EEG and pre-train a symmetric ECG encoder, equipping each modality with a rich foundational representation. These representations are then fused via simple embedding concatenation, allowing the classification head to learn cross-modal interactions, together enabling effective downstream learning despite limited multi-modal supervision. Evaluated on emotion recognition, our approach achieves near state-of-the-art performance, demonstrating that carefully designed physiological encoders, even with straightforward fusion, substantially improve downstream performance. These results highlight the potential of foundation-model approaches to harness the holistic nature of physiological signals, enabling scalable, label-efficient, and generalizable solutions for healthcare and affective computing.

Related papers

Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature [0.1631115063641726]
ECG signals are feasible on wearable equipment pieces such as headbands.<n>This study investigates whether ECG-derived features can serve as surrogate indicators of cognitive load.
arXiv Detail & Related papers (2026-01-11T07:20:30Z)
Transferring Clinical Knowledge into ECGs Representation [0.19498378931702776]
We propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data into a powerful, yet unimodal, ECG encoder.<n>We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information.<n>As an indirect way to explain the model's output, we train it to also predict associated laboratory abnormalities directly from the ECG embedding.
arXiv Detail & Related papers (2025-12-07T22:19:24Z)
Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z)
WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities [55.00677513249723]
EEG signals simultaneously encode both cognitive processes and intrinsic neural states.<n>We map EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation.<n>The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations.
arXiv Detail & Related papers (2025-09-26T06:21:51Z)
Sensing Cardiac Health Across Scenarios and Devices: A Multi-Modal Foundation Model Pretrained on Heterogeneous Data from 1.7 Million Individuals [36.08910150609342]
We present a cardiac sensing foundation model (CSFM) that learns unified representations from vast, heterogeneous health records.<n>Our model is pretrained on an innovative multi-modal integration of data from multiple large-scale datasets.<n> CSFM consistently outperforms traditional one-modal-one-task approaches.
arXiv Detail & Related papers (2025-06-23T20:58:12Z)
CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z)
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities [9.785262633953794]
Physio Omni is a foundation model for multimodal physiological signal analysis.<n>It trains a decoupled multimodal tokenizer, enabling masked signal pre-training.<n>It achieves state-of-the-art performance while maintaining strong robustness to missing modalities.
arXiv Detail & Related papers (2025-04-28T09:00:04Z)
CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z)
Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners [10.088785685439134]
We propose D-BETA, a framework that pre-trains ECG and text data using a contrastive masked auto-encoder architecture.<n>D-BETA uniquely combines the strengths of generative with boosted discriminative capabilities to achieve robust cross-modal representations.
arXiv Detail & Related papers (2024-10-03T01:24:09Z)
fMRI from EEG is only Deep Learning away: the use of interpretable DL to unravel EEG-fMRI relationships [68.8204255655161]
We present an interpretable domain grounded solution to recover the activity of several subcortical regions from multichannel EEG data. We recover individual spatial and time-frequency patterns of scalp EEG predictive of the hemodynamic signal in the subcortical nuclei.
arXiv Detail & Related papers (2022-10-23T15:11:37Z)
ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings. We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework. The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.