Reinforcing Trustworthiness in Multimodal Emotional Support Systems
- URL: http://arxiv.org/abs/2511.10011v2
- Date: Mon, 17 Nov 2025 11:26:54 GMT
- Title: Reinforcing Trustworthiness in Multimodal Emotional Support Systems
- Authors: Huy M. Le, Dat Tien Nguyen, Ngan T. T. Vo, Tuan D. Q. Nguyen, Nguyen Binh Le, Duy Minh Ho Nguyen, Daniel Sonntag, Lizi Liao, Binh T. Nguyen,
- Abstract summary: Multimodal approaches to emotional support show great promise by integrating diverse data sources to provide empathetic, contextually relevant responses.<n>We introduce textsc MultiMood, a new framework that leverages multimodal embeddings from video, audio, and text to predict emotional components and to produce responses aligned with professional therapeutic standards.
- Score: 19.59836948857841
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In today's world, emotional support is increasingly essential, yet it remains challenging for both those seeking help and those offering it. Multimodal approaches to emotional support show great promise by integrating diverse data sources to provide empathetic, contextually relevant responses, fostering more effective interactions. However, current methods have notable limitations, often relying solely on text or converting other data types into text, or providing emotion recognition only, thus overlooking the full potential of multimodal inputs. Moreover, many studies prioritize response generation without accurately identifying critical emotional support elements or ensuring the reliability of outputs. To overcome these issues, we introduce \textsc{ MultiMood}, a new framework that (i) leverages multimodal embeddings from video, audio, and text to predict emotional components and to produce responses responses aligned with professional therapeutic standards. To improve trustworthiness, we (ii) incorporate novel psychological criteria and apply Reinforcement Learning (RL) to optimize large language models (LLMs) for consistent adherence to these standards. We also (iii) analyze several advanced LLMs to assess their multimodal emotional support capabilities. Experimental results show that MultiMood achieves state-of-the-art on MESC and DFEW datasets while RL-driven trustworthiness improvements are validated through human and LLM evaluations, demonstrating its superior capability in applying a multimodal framework in this domain.
Related papers
- MERRY: Semantically Decoupled Evaluation of Multimodal Emotional and Role Consistencies of Role-Playing Agents [41.829135334587626]
MERRY is a semantically decoupled evaluation framework for assessing Multimodal Emotional and Role consistencies of Role-playing agents.<n>We transform the traditional subjective scoring approach into a novel bidirectional-evidence-finding task.<n>We conduct extensive evaluations based on MERRY.
arXiv Detail & Related papers (2026-02-24T02:53:58Z) - Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations [94.62792643569567]
This work systematically investigates the role of speaker emotion.<n>We construct a dataset of malicious speech instructions expressed across multiple emotions and intensities, and evaluate several state-of-the-art LALMs.<n>Our results reveal substantial safety inconsistencies: different emotions elicit varying levels of unsafe responses, and the effect of intensity is non-monotonic, with medium expressions often posing the greatest risk.
arXiv Detail & Related papers (2025-10-19T15:41:25Z) - SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media [6.2300278659598485]
We propose SentiMM, a novel multi-agent framework for sentiment analysis.<n>SentiMM processes text and visual inputs through specialized agents, fuses multimodal features, enriches context via knowledge retrieval, and aggregates results for final sentiment classification.<n>We also introduce SentiMMD, a large-scale multimodal dataset with seven fine-grained sentiment categories.
arXiv Detail & Related papers (2025-08-25T15:17:53Z) - Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark [15.900703216919169]
We introduce a multi-turn multimodal emotion understanding and reasoning benchmark, which encompasses 1,451 video data from real-life scenarios, along with 5,101 progressive questions.<n>We propose a multi-agent framework, where each agent specializes in a specific aspect, such as background context, character dynamics, and event details, to improve the system's reasoning capabilities.
arXiv Detail & Related papers (2025-08-23T01:10:29Z) - MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models [108.61337743051483]
We present MME-Emotion, a systematic benchmark that assesses both emotional understanding and reasoning capabilities of MLLMs.<n>MME-Emotion contains over 6,000 curated video clips with task-specific questioning-answering (QA) pairs, spanning broad scenarios to formulate eight emotional tasks.<n>It incorporates a holistic evaluation suite with hybrid metrics for emotion recognition and reasoning, analyzed through a multi-agent system framework.
arXiv Detail & Related papers (2025-08-11T03:14:55Z) - Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark [73.27104042215207]
We introduce EMMA, a benchmark targeting organic multimodal reasoning across mathematics, physics, chemistry, and coding.<n>EMMA tasks demand advanced cross-modal reasoning that cannot be addressed by reasoning independently in each modality.<n>Our evaluation of state-of-the-art MLLMs on EMMA reveals significant limitations in handling complex multimodal and multi-step reasoning tasks.
arXiv Detail & Related papers (2025-01-09T18:55:52Z) - SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent [27.301608019492043]
Large Language Models (LLMs) have demonstrated promising potential in providing empathetic support during interactions.<n>We propose an innovative strategy-enhanced role-playing framework, designed to simulate authentic emotional support conversations.<n>Within this framework, we develop the textbfServeForEmo dataset, comprising an extensive collection of 3.7K+ multi-turn dialogues and 62.8K+ utterances.
arXiv Detail & Related papers (2024-12-11T13:56:04Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey [66.166184609616]
ChatGPT has opened up immense potential for applying large language models (LLMs) to text-centric multimodal tasks.
It is still unclear how existing LLMs can adapt better to text-centric multimodal sentiment analysis tasks.
arXiv Detail & Related papers (2024-06-12T10:36:27Z) - MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.<n>Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.<n>Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z) - Empathy Through Multimodality in Conversational Interfaces [1.360649555639909]
Conversational Health Agents (CHAs) are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence.
This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support.
It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses.
arXiv Detail & Related papers (2024-05-08T02:48:29Z) - TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis [26.867610944625337]
Multimodal Sentiment Analysis (MSA) endeavors to understand human sentiment by leveraging language, visual, and acoustic modalities.<n>Past research predominantly focused on improving representation learning techniques and feature fusion strategies.<n>We introduce a Text-oriented Cross-Attention Network (TCAN) emphasizing the predominant role of the text modality in MSA.
arXiv Detail & Related papers (2024-04-06T07:56:09Z) - Building Emotional Support Chatbots in the Era of LLMs [64.06811786616471]
We introduce an innovative methodology that synthesizes human insights with the computational prowess of Large Language Models (LLMs)
By utilizing the in-context learning potential of ChatGPT, we generate an ExTensible Emotional Support dialogue dataset, named ExTES.
Following this, we deploy advanced tuning techniques on the LLaMA model, examining the impact of diverse training strategies, ultimately yielding an LLM meticulously optimized for emotional support interactions.
arXiv Detail & Related papers (2023-08-17T10:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.