Multimodal Functional Maximum Correlation for Emotion Recognition
- URL: http://arxiv.org/abs/2512.23076v1
- Date: Sun, 28 Dec 2025 20:48:02 GMT
- Title: Multimodal Functional Maximum Correlation for Emotion Recognition
- Authors: Deyang Zheng, Tianyi Zhang, Wenming Zheng, Shujian Yu,
- Abstract summary: Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems.<n>We propose Multimodal Functional Maximum Correlation (MFMC) to maximize higher-order multimodal dependence.<n>MFMC consistently state-of-the-art or competitive under both subject-dependent and subject-independent evaluation protocols.
- Score: 41.64451298000105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundamental challenge for multimodal representation learning in affective computing. Learning such joint dynamics is further complicated by the scarcity and subjectivity of affective annotations, which motivates the use of self-supervised learning (SSL). However, most existing SSL approaches rely on pairwise alignment objectives, which are insufficient to characterize dependencies among more than two modalities and fail to capture higher-order interactions arising from coordinated brain and autonomic responses. To address this limitation, we propose Multimodal Functional Maximum Correlation (MFMC), a principled SSL framework that maximizes higher-order multimodal dependence through a Dual Total Correlation (DTC) objective. By deriving a tight sandwich bound and optimizing it using a functional maximum correlation analysis (FMCA) based trace surrogate, MFMC captures joint multimodal interactions directly, without relying on pairwise contrastive losses. Experiments on three public affective computing benchmarks demonstrate that MFMC consistently achieves state-of-the-art or competitive performance under both subject-dependent and subject-independent evaluation protocols, highlighting its robustness to inter-subject variability. In particular, MFMC improves subject-dependent accuracy on CEAP-360VR from 78.9% to 86.8%, and subject-independent accuracy from 27.5% to 33.1% using the EDA signal alone. Moreover, MFMC remains within 0.8 percentage points of the best-performing method on the most challenging EEG subject-independent split of MAHNOB-HCI. Our code is available at https://github.com/DY9910/MFMC.
Related papers
- Effective and Robust Multimodal Medical Image Analysis [2.0518682437126095]
Multimodal Fusion Learning (MFL) has shown great potential for addressing medical problems such as skin cancer and brain tumor prediction.<n>Existing MFL methods face three key limitations.<n>We propose a novel Multi-Attention Integration Learning (MAIL) network, incorporating two key components.
arXiv Detail & Related papers (2026-02-17T04:23:46Z) - Multimodal Classification via Total Correlation Maximization [11.720319082362629]
Multimodal learning integrates data from diverse sensors to harness information from different modalities.<n>Recent studies reveal that joint learning often overfits certain modalities while neglecting others, leading to performance inferior to that of unimodal learning.<n>We propose a method for multimodal classification by maximizing the total correlation between multimodal features and labels.
arXiv Detail & Related papers (2026-02-13T15:21:45Z) - MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition [8.732416479560605]
This paper proposes Multimodal Cross-Attention Network and Contrastive Learning (MCN-CL) for multimodal emotion recognition.<n>It uses a triple query mechanism and hard negative mining strategy to remove feature redundancy while preserving important emotional cues.<n>Experiment results on the IEMOCAP and MELD datasets show that our proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-11-14T02:13:31Z) - Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process [55.91649771370862]
Dirichlet process (DP) mixture model is a powerful non-parametric method that can amplify the most prominent features.<n>We propose a new DP-driven multimodal learning framework that automatically achieves an optimal balance between prominent intra-modal representation learning and cross-modal alignment.
arXiv Detail & Related papers (2025-10-23T16:53:24Z) - Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models [4.064135211977999]
Large language models (LLMs) and vision-language models (LVLMs) struggle with complex, multi-step, cross-modal common sense reasoning tasks.<n>We propose the Coherent Multimodal Reasoning Framework (CMRF), a novel approach that enhances LVLMs' common sense reasoning capabilities.<n>CMRF mimics human problem-solving by decomposing complex queries, generating step-by-step inferences, and self-correcting errors.
arXiv Detail & Related papers (2025-08-04T20:33:58Z) - Multimodal Fine-grained Reasoning for Post Quality Evaluation [1.806315356676339]
We propose the Multimodal Fine-grained Topic-post Reasoning (MFTRR) framework, which mimics human cognitive processes.<n>MFTRR reframes post-quality assessment as a ranking task and incorporates multimodal data to better capture quality variations.
arXiv Detail & Related papers (2025-07-21T04:30:50Z) - Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion [59.54067771781552]
We propose a framework named MMFeD3-HidE for addressing multimodal uncertain unavailability and multimodal client heterogeneity challenges of FedMKGC.<n>We propose a FedMKGC benchmark for a comprehensive evaluation, consisting of a general FedMKGC backbone named MMFedE, datasets with heterogeneous multimodal information, and three groups of constructed baselines.
arXiv Detail & Related papers (2025-06-27T09:32:58Z) - MIA-Mind: A Multidimensional Interactive Attention Mechanism Based on MindSpore [0.0]
We propose MIA-Mind, a lightweight and modular Multidimensional Interactive Attention Mechanism.<n> MIA-Mind jointly models spatial and channel features through a unified cross-attentive fusion strategy.<n>Experiments are conducted on three representative datasets.
arXiv Detail & Related papers (2025-04-27T02:27:50Z) - Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations [1.0690007351232649]
Multimodal approaches benefit from the fusion of diverse modalities, thereby improving the recognition accuracy.<n>The proposed Qieemo framework effectively utilizes the pretrained automatic speech recognition (ASR) model which contains naturally frame aligned textual and emotional features.<n>The experimental results on the IEMOCAP dataset demonstrate that Qieemo outperforms the benchmark unimodal, multimodal, and self-supervised models with absolute improvements of 3.0%, 1.2%, and 1.9% respectively.
arXiv Detail & Related papers (2025-03-05T07:02:30Z) - Mixture Compressor for Mixture-of-Experts LLMs Gains More [71.0473038084673]
We propose a training-free Mixture-Compressor for Mixture-of-Experts large language models (MoE-LLMs)<n>Our MC integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss.<n>For instance, at 2.54 bits, MC compresses 76.6% of the model, with only a 3.8% average accuracy loss.
arXiv Detail & Related papers (2024-10-08T18:09:38Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.