Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces
- URL: http://arxiv.org/abs/2507.13092v1
- Date: Thu, 17 Jul 2025 13:03:20 GMT
- Title: Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces
- Authors: Hyo-Jeong Jang, Hye-Bin Shin, Seong-Whan Lee,
- Abstract summary: multimodal knowledge distillation (KD) has been explored to transfer knowledge from visual models to EEG-based models.<n>This paper addresses semantic uncertainty caused by ambiguous features and weakly defined labels.<n>We propose a novel cross-modal knowledge distillation framework that mitigates both modality and label inconsistencies.
- Score: 26.585985828583304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electroencephalography (EEG) is a fundamental modality for cognitive state monitoring in brain-computer interfaces (BCIs). However, it is highly susceptible to intrinsic signal errors and human-induced labeling errors, which lead to label noise and ultimately degrade model performance. To enhance EEG learning, multimodal knowledge distillation (KD) has been explored to transfer knowledge from visual models with rich representations to EEG-based models. Nevertheless, KD faces two key challenges: modality gap and soft label misalignment. The former arises from the heterogeneous nature of EEG and visual feature spaces, while the latter stems from label inconsistencies that create discrepancies between ground truth labels and distillation targets. This paper addresses semantic uncertainty caused by ambiguous features and weakly defined labels. We propose a novel cross-modal knowledge distillation framework that mitigates both modality and label inconsistencies. It aligns feature semantics through a prototype-based similarity module and introduces a task-specific distillation head to resolve label-induced inconsistency in supervision. Experimental results demonstrate that our approach improves EEG-based emotion regression and classification performance, outperforming both unimodal and multimodal baselines on a public multimodal dataset. These findings highlight the potential of our framework for BCI applications.
Related papers
- Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification [59.59359638389348]
We propose a Dual-level Modality Debiasing Learning framework that implements debiasing at both the model and optimization levels.<n>Experiments on benchmark datasets demonstrate that DMDL could enable modality-invariant feature learning and a more generalized model.
arXiv Detail & Related papers (2025-12-03T12:43:16Z) - Cross-Modal Consistency-Guided Active Learning for Affective BCI Systems [1.9556470931534158]
We propose an uncertainty-aware active learning framework that enhances robustness to label noise.<n>Instead of relying solely on EEG-based uncertainty estimates, the method evaluates cross-modal alignment.<n>This feedback-driven process guides the network toward reliable, informative samples and reduces the impact of noisy labels.
arXiv Detail & Related papers (2025-11-19T05:33:48Z) - Unsupervised Pairwise Learning Optimization Framework for Cross-Corpus EEG-Based Emotion Recognition Based on Prototype Representation [1.3354439722832292]
Cross-corpus emotion recognition faces serious challenges, especially for samples near the decision boundary.<n>We propose an optimization method based on domain adversarial transfer learning to fine-grained alignment of affective features.<n>Our work provides a promising solution for emotion recognition cross-corpus.
arXiv Detail & Related papers (2025-08-06T06:29:19Z) - CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z) - PL-DCP: A Pairwise Learning framework with Domain and Class Prototypes for EEG emotion recognition under unseen target conditions [27.769583873372518]
We propose a Pairwise Learning framework with Domain and Category Prototypes for EEG emotion recognition under unseen target conditions.<n>The PL-DCP model achieves slightly better performance than the deep transfer learning method that requires both source and target data.<n>This work provides an effective and robust potential solution for emotion recognition.
arXiv Detail & Related papers (2024-11-27T00:56:43Z) - A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition [14.199298112101802]
This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER)
We propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss.
arXiv Detail & Related papers (2024-05-12T11:51:00Z) - Graph Convolutional Network with Connectivity Uncertainty for EEG-based
Emotion Recognition [20.655367200006076]
This study introduces the distribution-based uncertainty method to represent spatial dependencies and temporal-spectral relativeness in EEG signals.
The graph mixup technique is employed to enhance latent connected edges and mitigate noisy label issues.
We evaluate our approach on two widely used datasets, namely SEED and SEEDIV, for emotion recognition tasks.
arXiv Detail & Related papers (2023-10-22T03:47:11Z) - Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition [19.578050094283313]
The DS-AGC framework is proposed to tackle the challenge of limited labeled data in cross-subject EEG-based emotion recognition.
The proposed model outperforms existing methods under different incomplete label conditions.
arXiv Detail & Related papers (2023-08-13T23:54:40Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Uncertain Facial Expression Recognition via Multi-task Assisted
Correction [43.02119884581332]
We propose a novel method of multi-task assisted correction in addressing uncertain facial expression recognition called MTAC.
Specifically, a confidence estimation block and a weighted regularization module are applied to highlight solid samples and suppress uncertain samples in every batch.
Experiments on RAF-DB, AffectNet, and AffWild2 datasets demonstrate that the MTAC obtains substantial improvements over baselines when facing synthetic and real uncertainties.
arXiv Detail & Related papers (2022-12-14T10:28:08Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.