A Multi-modal and Multi-task Learning Method for Action Unit and
Expression Recognition
- URL: http://arxiv.org/abs/2107.04187v1
- Date: Fri, 9 Jul 2021 03:28:17 GMT
- Title: A Multi-modal and Multi-task Learning Method for Action Unit and
Expression Recognition
- Authors: Yue Jin, Tianqing Zheng, Chao Gao, Guoqiang Xu
- Abstract summary: We introduce a multi-modal and multi-task learning method by using both visual and audio information.
We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set.
- Score: 18.478011167414223
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Analyzing human affect is vital for human-computer interaction systems. Most
methods are developed in restricted scenarios which are not practical for
in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021
Contest provides a benchmark for this in-the-wild problem. In this paper, we
introduce a multi-modal and multi-task learning method by using both visual and
audio information. We use both AU and expression annotations to train the model
and apply a sequence model to further extract associations between video
frames. We achieve an AU score of 0.712 and an expression score of 0.477 on the
validation set. These results demonstrate the effectiveness of our approach in
improving model performance.
Related papers
- Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Multi-modal Facial Affective Analysis based on Masked Autoencoder [7.17338843593134]
We introduce our submission to the CVPR 2023: ABAW5 competition: Affective Behavior Analysis in-the-wild.
Our approach involves several key components. First, we utilize the visual information from a Masked Autoencoder(MAE) model that has been pre-trained on a large-scale face image dataset in a self-supervised manner.
Our approach achieves impressive results in the ABAW5 competition, with an average F1 score of 55.49% and 41.21% in the AU and EXPR tracks, respectively.
arXiv Detail & Related papers (2023-03-20T03:58:03Z) - Ensemble knowledge distillation of self-supervised speech models [84.69577440755457]
Distilled self-supervised models have shown competitive performance and efficiency in recent years.
We performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM.
Our method improves the performance of the distilled models on four downstream speech processing tasks.
arXiv Detail & Related papers (2023-02-24T17:15:39Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes [91.24112204588353]
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks.
In contrast to previous models, UViM has the same functional form for all tasks.
We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks.
arXiv Detail & Related papers (2022-05-20T17:47:59Z) - Multi-model Ensemble Learning Method for Human Expression Recognition [31.76775306959038]
We propose our solution based on the ensemble learning method to capture large amounts of real-life data.
We conduct many experiments on the AffWild2 dataset of the ABAW2022 Challenge, and the results demonstrate the effectiveness of our solution.
arXiv Detail & Related papers (2022-03-28T03:15:06Z) - Multi-modal Multi-label Facial Action Unit Detection with Transformer [7.30287060715476]
This paper describes our submission to the third Affective Behavior Analysis (ABAW) 2022 competition.
We proposed a transfomer based model to detect facial action unit (FAU) in video.
arXiv Detail & Related papers (2022-03-24T18:59:31Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Multi-modal Affect Analysis using standardized data within subjects in
the Wild [8.05417723395965]
We introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation.
Our proposed framework can improve estimation accuracy and robustness effectively.
arXiv Detail & Related papers (2021-07-07T04:18:28Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - A Multi-term and Multi-task Analyzing Framework for Affective Analysis
in-the-wild [0.2216657815393579]
We introduce the affective recognition method that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2020 Contest.
Since affective behaviors have many observable features that have their own time frames, we introduced multiple optimized time windows.
We generated affective recognition models for each time window and ensembled these models together.
arXiv Detail & Related papers (2020-09-29T09:24:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.