Facial Affective Behavior Analysis with Instruction Tuning
- URL: http://arxiv.org/abs/2404.05052v2
- Date: Fri, 12 Jul 2024 17:52:03 GMT
- Title: Facial Affective Behavior Analysis with Instruction Tuning
- Authors: Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong,
- Abstract summary: Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images.
Traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors.
We introduce an instruction-following dataset for two FABA tasks, emotion and action unit recognition, and a benchmark FABA-Bench with a new metric considering both recognition and generation ability.
We also introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM.
- Score: 58.332959295770614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. However, traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. The advent of Multi-modal Large Language Models (MLLMs) has been proven successful in general visual understanding tasks. However, directly harnessing MLLMs for FABA is challenging due to the scarcity of datasets and benchmarks, neglecting facial prior knowledge, and low training efficiency. To address these challenges, we introduce (i) an instruction-following dataset for two FABA tasks, e.g., emotion and action unit recognition, (ii) a benchmark FABA-Bench with a new metric considering both recognition and generation ability, and (iii) a new MLLM "EmoLA" as a strong baseline to the community. Our initiative on the dataset and benchmarks reveal the nature and rationale of facial affective behaviors, i.e., fine-grained facial movement, interpretability, and reasoning. Moreover, to build an effective and efficient FABA MLLM, we introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM. We conduct extensive experiments on FABA-Bench and four commonly-used FABA datasets. The results demonstrate that the proposed facial prior expert can boost the performance and EmoLA achieves the best results on our FABA-Bench. On commonly-used FABA datasets, EmoLA is competitive rivaling task-specific state-of-the-art models.
Related papers
- The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition [1.4374467687356276]
This paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification.
We suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model.
The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset.
arXiv Detail & Related papers (2024-03-19T16:21:47Z) - EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis [33.0280076936761]
We propose EmoLLMs, the first series of open-sourced instruction-following LLMs for affective analysis based on fine-tuning various LLMs with instruction data.
EmoLLMs achieve the ChatGPT-level and GPT-4-level generalization capabilities on affective analysis tasks, and demonstrates our models can be used as affective annotation tools.
arXiv Detail & Related papers (2024-01-16T17:11:11Z) - PALM: Predicting Actions through Language Models [74.10147822693791]
We introduce PALM, an approach that tackles the task of long-term action anticipation.
Our method incorporates an action recognition model to track previous action sequences and a vision-language model to articulate relevant environmental details.
Our experimental results demonstrate that PALM surpasses the state-of-the-art methods in the task of long-term action anticipation.
arXiv Detail & Related papers (2023-11-29T02:17:27Z) - Asking Before Acting: Gather Information in Embodied Decision Making with Language Models [20.282749796376063]
We show that Large Language Models (LLMs) encounter challenges in efficiently gathering essential information in unfamiliar environments.
We propose textitAsking Before Acting (ABA), a method that empowers the agent to proactively inquire with external sources for pertinent information using natural language.
We conduct extensive experiments involving a spectrum of environments including text-based household everyday tasks, robot arm manipulation tasks, and real world open domain image based embodied tasks.
arXiv Detail & Related papers (2023-05-25T04:05:08Z) - Continual Facial Expression Recognition: A Benchmark [3.181579197770883]
This work presents the Continual Facial Expression Recognition (ConFER) benchmark that evaluates popular CL techniques on FER tasks.
It presents a comparative analysis of several CL-based approaches on popular FER datasets such as CK+, RAF-DB, and AffectNet.
CL techniques, under different learning settings, are shown to achieve state-of-the-art (SOTA) performance across several datasets.
arXiv Detail & Related papers (2023-05-10T20:35:38Z) - MAML and ANIL Provably Learn Representations [60.17417686153103]
We prove that two well-known meta-learning methods, MAML and ANIL, are capable of learning common representation among a set of given tasks.
Specifically, in the well-known multi-task linear representation learning setting, they are able to recover the ground-truth representation at an exponentially fast rate.
Our analysis illuminates that the driving force causing MAML and ANIL to recover the underlying representation is that they adapt the final layer of their model.
arXiv Detail & Related papers (2022-02-07T19:43:02Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.