An Ensemble Approach for Multiple Emotion Descriptors Estimation Using
Multi-task Learning
- URL: http://arxiv.org/abs/2207.10878v1
- Date: Fri, 22 Jul 2022 04:57:56 GMT
- Title: An Ensemble Approach for Multiple Emotion Descriptors Estimation Using
Multi-task Learning
- Authors: Irfan Haider, Minh-Trieu Tran, Soo-Hyung Kim, Hyung-Jeong Yang,
Guee-Sang Lee
- Abstract summary: This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition.
Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face.
The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.
- Score: 12.589338141771385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper illustrates our submission method to the fourth Affective Behavior
Analysis in-the-Wild (ABAW) Competition. The method is used for the Multi-Task
Learning Challenge. Instead of using only face information, we employ full
information from a provided dataset containing face and the context around the
face. We utilized the InceptionNet V3 model to extract deep features then we
applied the attention mechanism to refine the features. After that, we put
those features into the transformer block and multi-layer perceptron networks
to get the final multiple kinds of emotion. Our model predicts arousal and
valence, classifies the emotional expression and estimates the action units
simultaneously. The proposed system achieves the performance of 0.917 on the
MTL Challenge validation dataset.
Related papers
- Face-MLLM: A Large Face Perception Model [53.9441375205716]
multimodal large language models (MLLMs) have achieved promising results on a wide range of vision-language tasks, but their ability to perceive and understand human faces is rarely explored.
In this work, we comprehensively evaluate existing MLLMs on face perception tasks.
Our model surpasses previous MLLMs on five famous face perception tasks.
arXiv Detail & Related papers (2024-10-28T04:19:32Z) - Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition [6.995226697189459]
We employ a multi-modal self-supervised learning method for facial expression recognition from in-the-wild video data.
Our results generally show that multi-modal self-supervision tasks offer large performance gains for challenging tasks.
We release our pre-trained models as well as source code publicly.
arXiv Detail & Related papers (2024-04-16T20:51:36Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Affective Behaviour Analysis via Integrating Multi-Modal Knowledge [24.74463315135503]
The 6th competition on Affective Behavior Analysis in-the-wild (ABAW) utilizes the Aff-Wild2, Hume-Vidmimic2, and C-EXPR-DB datasets.
We present our method designs for the five competitive tracks, i.e., Valence-Arousal (VA) Estimation, Expression (EXPR) Recognition, Action Unit (AU) Detection, Compound Expression (CE) Recognition, and Emotional Mimicry Intensity (EMI) Estimation.
arXiv Detail & Related papers (2024-03-16T06:26:43Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge [41.32053075381269]
The task of ABAW is to predict frame-level emotion descriptors from videos.
We propose a novel end to end architecture to achieve full integration of different types of information.
arXiv Detail & Related papers (2022-07-23T01:48:51Z) - Emotion Recognition based on Multi-Task Learning Framework in the ABAW4
Challenge [12.662242704351563]
This paper presents our submission to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.
Based on visual feature representations, we utilize three types of temporal encoder to capture the temporal context information in the video.
Our system achieves the performance of $1.742$ on MTL Challenge validation dataset.
arXiv Detail & Related papers (2022-07-19T16:18:53Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - A Multi-resolution Approach to Expression Recognition in the Wild [9.118706387430883]
We propose a multi-resolution approach to solve the Facial Expression Recognition task.
We ground our intuition on the observation that often faces images are acquired at different resolutions.
To our aim, we use a ResNet-like architecture, equipped with Squeeze-and-Excitation blocks, trained on the Affect-in-the-Wild 2 dataset.
arXiv Detail & Related papers (2021-03-09T21:21:02Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.