Related papers: EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

EmoLLM: Multimodal Emotional Understanding Meets Large Language Models

URL: http://arxiv.org/abs/2406.16442v2
Date: Sat, 29 Jun 2024 14:32:46 GMT
Title: EmoLLM: Multimodal Emotional Understanding Meets Large Language Models
Authors: Qu Yang, Mang Ye, Bo Du,
Abstract summary: Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks. But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
Score: 61.179731667080326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks, but their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. Thus, it impedes their ability to effectively understand and react to the intricate emotions expressed by humans through multimodal media. To bridge this gap, we introduce EmoBench, the first comprehensive benchmark designed specifically to evaluate the emotional capabilities of MLLMs across five popular emotional tasks, using a diverse dataset of 287k images and videos paired with corresponding textual instructions. Meanwhile, we propose EmoLLM, a novel model for multimodal emotional understanding, incorporating with two core techniques. 1) Multi-perspective Visual Projection, it captures diverse emotional cues from visual data from multiple perspectives. 2) EmoPrompt, it guides MLLMs to reason about emotions in the correct direction. Experimental results demonstrate that EmoLLM significantly elevates multimodal emotional understanding performance, with an average improvement of 12.1% across multiple foundation models on EmoBench. Our work contributes to the advancement of MLLMs by facilitating a deeper and more nuanced comprehension of intricate human emotions, paving the way for the development of artificial emotional intelligence capabilities with wide-ranging applications in areas such as human-computer interaction, mental health support, and empathetic AI systems. Code, data, and model will be released.

Related papers

UniEmo: Unifying Emotional Understanding and Generation with Learnable Expert Queries [61.5273479616832]
We propose a unified framework that seamlessly integrates emotional understanding and generation.<n>We show that UniEmo significantly outperforms state-of-the-art methods in both emotional understanding and generation tasks.
arXiv Detail & Related papers (2025-07-31T09:39:27Z)
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding [24.884935271771624]
We present Emotion-Qwen, a tailored framework designed to enhance both emotion understanding and general vision-language reasoning.<n>Emotion-Qwen incorporates a sophisticated Hybrid based on the Mixture of Experts (MoE) paradigm, which dynamically routes inputs to balance emotion-specific and general-purpose processing.<n>We construct the Video Emotion Reasoning (VER) dataset, comprising more than 40K bilingual video clips with fine-grained descriptive annotations, to further enrich Emotion-Qwen's emotional reasoning capability.
arXiv Detail & Related papers (2025-05-10T16:15:26Z)
AI with Emotions: Exploring Emotional Expressions in Large Language Models [0.0]
Large Language Models (LLMs) play role-play as agents answering questions with specified emotional states. Russell's Circumplex model characterizes emotions along the sleepy-activated (arousal) and pleasure-displeasure (valence) axes. evaluation showed that the emotional states of the generated answers were consistent with the specifications.
arXiv Detail & Related papers (2025-04-20T18:49:25Z)
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models [27.195518991292488]
EmoBench-M is a novel benchmark designed to evaluate the emotional intelligence (EI) capability of Multimodal large language models (MLLMs) Evaluations of both open-source and closed-source MLLMs on EmoBench-M reveal a significant performance gap between them and humans.
arXiv Detail & Related papers (2025-02-06T18:13:35Z)
Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis [6.387263468033964]
We introduce a self-reviewed dataset and a human-reviewed dataset, comprising 24,137 coarse-grained samples and 3,500 manually annotated samples with detailed emotion annotations. In addition to the audio modeling, we propose to explicitly integrate facial encoding models into the existing advanced Video MLLM. Our Omni-Emotion achieves state-of-the-art performance in both emotion recognition and reasoning tasks.
arXiv Detail & Related papers (2025-01-16T12:27:05Z)
HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data [55.739633494946204]
We present HumanVBench, an innovative benchmark meticulously crafted to bridge gaps in the evaluation of video MLLMs. HumanVBench comprises 16 carefully designed tasks that explore two primary dimensions: inner emotion and outer manifestations, spanning static and dynamic, basic and complex, as well as single-modal and cross-modal aspects. A comprehensive evaluation across 22 SOTA video MLLMs reveals notable limitations in current performance, especially in cross-modal and emotion perception.
arXiv Detail & Related papers (2024-12-23T13:45:56Z)
EmoVerse: Exploring Multimodal Large Language Models for Sentiment and Emotion Understanding [5.3848462080869215]
We introduce Emotion Universe (EmoVerse), an MLLM designed to handle a broad spectrum of sentiment and emotion-related tasks. EmoVerse is capable of deeply analyzing the underlying causes of emotional states. We also introduce the Affective Multitask (AMT) dataset.
arXiv Detail & Related papers (2024-12-11T02:55:00Z)
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions. Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones. Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z)
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception [8.54013419046987]
We introduce UniEmoX, a cross-modal semantic-guided large-scale pretraining framework for visual emotion analysis. By exploiting the similarity between paired and unpaired image-text samples, UniEmoX distills rich semantic knowledge from the CLIP model to enhance emotional embedding representations. We develop a visual emotional dataset titled Emo8, covering nearly all common emotional scenes.
arXiv Detail & Related papers (2024-09-27T16:12:51Z)
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model [5.301672905886949]
This report introduces the solution of using MLLMs technology to generate open-vocabulary emotion labels from a video. In the MER-OV (Open-Word Emotion Recognition) of the MER2024 challenge, our method achieved significant advantages, leading to its superior capabilities in complex emotion computation.
arXiv Detail & Related papers (2024-08-21T02:17:18Z)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
We introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. We propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders.
arXiv Detail & Related papers (2024-06-17T03:01:22Z)
Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-Thought [50.13429055093534]
Large Language Models (LLMs) have shown remarkable performance in various emotion recognition tasks. We propose the Emotional Chain-of-Thought (ECoT) to enhance the performance of LLMs on various emotional generation tasks.
arXiv Detail & Related papers (2024-01-12T16:42:10Z)
Large Language Models Understand and Can be Enhanced by Emotional Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli. Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts. Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z)
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER) We begin with a brief introduction on widely used emotion representation models and affective modalities. We then summarize existing emotion annotation strategies and corresponding computational tasks. Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z)
HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition in the Wild [0.0]
We release a new natural state video database (called HEU Emotion) HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source. The recognition accuracies for the two parts increased by 2.19% and 4.01% respectively over those of single-modal facial expression recognition.
arXiv Detail & Related papers (2020-07-24T13:36:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.