Related papers: PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis

PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis

URL: http://arxiv.org/abs/2512.01442v1
Date: Mon, 01 Dec 2025 09:24:59 GMT
Title: PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
Authors: Heng Xie, Kang Zhu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Ruibo Fu, Changsheng Li,
Abstract summary: Main challenge lies in integrating sentiment-related information from different modalities.<n>Existing methods extract only shallow information from unimodal features during the extraction phase.<n>We propose a personality-sentiment aligned multi-level fusion framework.
Score: 41.114391204008875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal sentiment analysis (MSA) is a research field that recognizes human sentiments by combining textual, visual, and audio modalities. The main challenge lies in integrating sentiment-related information from different modalities, which typically arises during the unimodal feature extraction phase and the multimodal feature fusion phase. Existing methods extract only shallow information from unimodal features during the extraction phase, neglecting sentimental differences across different personalities. During the fusion phase, they directly merge the feature information from each modality without considering differences at the feature level. This ultimately affects the model's recognition performance. To address this problem, we propose a personality-sentiment aligned multi-level fusion framework. We introduce personality traits during the feature extraction phase and propose a novel personality-sentiment alignment method to obtain personalized sentiment embeddings from the textual modality for the first time. In the fusion phase, we introduce a novel multi-level fusion method. This method gradually integrates sentimental information from textual, visual, and audio modalities through multimodal pre-fusion and a multi-level enhanced fusion strategy. Our method has been evaluated through multiple experiments on two commonly used datasets, achieving state-of-the-art results.

Related papers

MIAR: Modality Interaction and Alignment Representation Fuison for Multimodal Emotion [14.294515952573105]
Multimodal Emotion Recognition aims to perceive human emotions through three modes: language, vision, and audio.<n>Previous methods primarily focused on modal fusion without adequately addressing significant distributional differences among modalities.<n>We propose a novel approach called Modality Interaction and Alignment Representation (MIAR)
arXiv Detail & Related papers (2026-01-03T06:26:13Z)
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis [62.31018417955254]
DeepMLF is a novel multimodal language model with learnable tokens tailored toward deep fusion.<n>Our results confirm that deeper fusion leads to better performance, with optimal fusion depths (5-7) exceeding those of existing approaches.
arXiv Detail & Related papers (2025-04-15T11:28:02Z)
Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations [19.731611716111566]
We propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations. We introduce a predictive self-attention module to capture reliable context dynamics within modalities. A hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities. A double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner.
arXiv Detail & Related papers (2024-07-06T04:36:48Z)
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z)
Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process. Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z)
Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations. Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z)
Multistage linguistic conditioning of convolutional layers for speech emotion recognition [7.482371204083917]
We investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER) We propose a novel, multistage fusion method where the two information streams are integrated in several layers of a deep neural network (DNN) Experiments on the widely used IEMOCAP and MSP-Podcast databases demonstrate that the two fusion methods clearly outperform a shallow (late) fusion baseline.
arXiv Detail & Related papers (2021-10-13T11:28:04Z)
Video Sentiment Analysis with Bimodal Information-augmented Multi-Head Attention [7.997124140597719]
This study focuses on the sentiment analysis of videos containing time series data of multiple modalities. The key problem is how to fuse these heterogeneous data. Based on bimodal interaction, more important bimodal features are assigned larger weights.
arXiv Detail & Related papers (2021-03-03T12:30:11Z)
TransModality: An End2End Fusion Method with Transformer for Multimodal Sentiment Analysis [42.6733747726081]
We propose a new fusion method, TransModality, to address the task of multimodal sentiment analysis. We validate our model on multiple multimodal datasets: CMU-MOSI, MELD, IEMOCAP.
arXiv Detail & Related papers (2020-09-07T06:11:56Z)
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces. The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap. Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.