SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
- URL: http://arxiv.org/abs/2508.18108v1
- Date: Mon, 25 Aug 2025 15:17:53 GMT
- Title: SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
- Authors: Xilai Xu, Zilin Zhao, Chengye Song, Zining Wang, Jinhe Qiang, Jiongrui Yan, Yuhuai Lin,
- Abstract summary: We propose SentiMM, a novel multi-agent framework for sentiment analysis.<n>SentiMM processes text and visual inputs through specialized agents, fuses multimodal features, enriches context via knowledge retrieval, and aggregates results for final sentiment classification.<n>We also introduce SentiMMD, a large-scale multimodal dataset with seven fine-grained sentiment categories.
- Score: 6.2300278659598485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing prevalence of multimodal content on social media, sentiment analysis faces significant challenges in effectively processing heterogeneous data and recognizing multi-label emotions. Existing methods often lack effective cross-modal fusion and external knowledge integration. We propose SentiMM, a novel multi-agent framework designed to systematically address these challenges. SentiMM processes text and visual inputs through specialized agents, fuses multimodal features, enriches context via knowledge retrieval, and aggregates results for final sentiment classification. We also introduce SentiMMD, a large-scale multimodal dataset with seven fine-grained sentiment categories. Extensive experiments demonstrate that SentiMM achieves superior performance compared to state-of-the-art baselines, validating the effectiveness of our structured approach.
Related papers
- MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction [52.89860691282002]
Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce.<n>Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data.<n>We introduce textscmodelname, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences.
arXiv Detail & Related papers (2025-10-07T06:27:42Z) - Bridging Cognition and Emotion: Empathy-Driven Multimodal Misinformation Detection [56.644686934050576]
Social media has become a major conduit for information dissemination, yet it also facilitates the rapid spread of misinformation.<n>Traditional misinformation detection methods primarily focus on surface-level features, overlooking the crucial roles of human empathy in the propagation process.<n>We propose the Dual-Aspect Empathy Framework (DAE), which integrates cognitive and emotional empathy to analyze misinformation from both the creator and reader perspectives.
arXiv Detail & Related papers (2025-04-24T07:48:26Z) - A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion [3.1409950035735914]
This paper proposes a novel multimodal emotion recognition approach, DeepMSI-MER, based on the integration of contrastive learning and visual sequence compression.<n> Experimental results on two public datasets, IEMOCAP and MELD, demonstrate that DeepMSI-MER significantly improves the accuracy and robustness of emotion recognition.
arXiv Detail & Related papers (2025-02-12T17:07:43Z) - Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment [10.278127492434297]
This paper introduces a Multi-Granularity Cross-Modal Alignment (MGCMA) framework, distinguished by its comprehensive approach encompassing distribution-based, instance-based, and token-based alignment modules.<n>Our experiments on IEMOCAP demonstrate that our proposed method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2024-12-30T09:30:41Z) - GCM-Net: Graph-enhanced Cross-Modal Infusion with a Metaheuristic-Driven Network for Video Sentiment and Emotion Analysis [2.012311338995539]
This paper presents a novel framework that leverages the multi-modal contextual information from utterances and applies metaheuristic algorithms to learn for utterance-level sentiment and emotion prediction.
To show the effectiveness of our approach, we have conducted extensive evaluations on three prominent multimodal benchmark datasets.
arXiv Detail & Related papers (2024-10-02T10:07:48Z) - Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities [16.77191718894291]
We propose a novel framework of Retrieval Augment for Missing Modality Multimodal Emotion Recognition (RAMER)
Our framework is superior to existing state-of-the-art approaches in missing modality MER tasks.
arXiv Detail & Related papers (2024-09-19T02:31:12Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.<n>MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.<n>Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.