FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning
- URL: http://arxiv.org/abs/2505.13419v1
- Date: Mon, 19 May 2025 17:52:15 GMT
- Title: FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning
- Authors: Zhuozhao Hu, Kaishen Yuan, Xin Liu, Zitong Yu, Yuan Zong, Jingang Shi, Huanjing Yue, Jingyu Yang,
- Abstract summary: Facial Emotion Analysis (FEA) plays a crucial role in visual affective computing.<n>FEALLM is a novel MLLM architecture designed to capture more detailed facial information.<n>Our model demonstrates strong performance on FEABench and impressive generalization capability.
- Score: 36.056594433947566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial Emotion Analysis (FEA) plays a crucial role in visual affective computing, aiming to infer a person's emotional state based on facial data. Scientifically, facial expressions (FEs) result from the coordinated movement of facial muscles, which can be decomposed into specific action units (AUs) that provide detailed emotional insights. However, traditional methods often struggle with limited interpretability, constrained generalization and reasoning abilities. Recently, Multimodal Large Language Models (MLLMs) have shown exceptional performance in various visual tasks, while they still face significant challenges in FEA due to the lack of specialized datasets and their inability to capture the intricate relationships between FEs and AUs. To address these issues, we introduce a novel FEA Instruction Dataset that provides accurate and aligned FE and AU descriptions and establishes causal reasoning relationships between them, followed by constructing a new benchmark, FEABench. Moreover, we propose FEALLM, a novel MLLM architecture designed to capture more detailed facial information, enhancing its capability in FEA tasks. Our model demonstrates strong performance on FEABench and impressive generalization capability through zero-shot evaluation on various datasets, including RAF-DB, AffectNet, BP4D, and DISFA, showcasing its robustness and effectiveness in FEA tasks. The dataset and code will be available at https://github.com/953206211/FEALLM.
Related papers
- FaceLLM: A Multimodal Large Language Model for Face Understanding [22.8742248559748]
We introduce FaceLLM, a multimodal large language model trained specifically for facial image understanding.<n>To construct the training data, we propose a novel weakly supervised pipeline that uses ChatGPT with attribute-aware prompts to generate high-quality question-answer pairs.<n>Our experiments demonstrate that FaceLLM improves the performance of MLLMs on various face-centric tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-07-14T14:04:14Z) - Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis [5.795431510723275]
We present a comprehensive pipeline for multimodal facial state analysis.<n>We introduce a novel Multilevel Multimodal Face Foundation model (MF2) tailored for Action Unit (AU) and emotion recognition.<n>Experimentation show superior performance for AU and emotion detection tasks.
arXiv Detail & Related papers (2025-04-14T16:00:57Z) - MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors [63.194053817609024]
We introduce eye behaviors as an important emotional cues for the creation of a new Eye-behavior-aided Multimodal Emotion Recognition dataset.
For the first time, we provide annotations for both Emotion Recognition (ER) and Facial Expression Recognition (FER) in the EMER dataset.
We specifically design a new EMERT architecture to concurrently enhance performance in both ER and FER.
arXiv Detail & Related papers (2024-11-08T04:53:55Z) - Face-MLLM: A Large Face Perception Model [53.9441375205716]
multimodal large language models (MLLMs) have achieved promising results on a wide range of vision-language tasks, but their ability to perceive and understand human faces is rarely explored.
In this work, we comprehensively evaluate existing MLLMs on face perception tasks.
Our model surpasses previous MLLMs on five famous face perception tasks.
arXiv Detail & Related papers (2024-10-28T04:19:32Z) - EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning [27.790079451103065]
We propose a novel MLLM, named EMO-LLaMA, which incorporates facial priors from a pretrained facial analysis network to enhance human facial information.
EMO-LLaMA achieves SOTA-comparable or competitive results across both static and dynamic FER datasets.
arXiv Detail & Related papers (2024-08-21T08:28:40Z) - Facial Affective Behavior Analysis with Instruction Tuning [58.332959295770614]
Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images.
Traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors.
We introduce an instruction-following dataset for two FABA tasks, emotion and action unit recognition, and a benchmark FABA-Bench with a new metric considering both recognition and generation ability.
We also introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM.
arXiv Detail & Related papers (2024-04-07T19:23:28Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z) - Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.