A Multi-term and Multi-task Analyzing Framework for Affective Analysis
in-the-wild
- URL: http://arxiv.org/abs/2009.13885v2
- Date: Fri, 2 Oct 2020 13:04:04 GMT
- Title: A Multi-term and Multi-task Analyzing Framework for Affective Analysis
in-the-wild
- Authors: Sachihiro Youoku, Yuushi Toyoda, Takahisa Yamamoto, Junya Saito,
Ryosuke Kawamura, Xiaoyu Mi and Kentaro Murase
- Abstract summary: We introduce the affective recognition method that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2020 Contest.
Since affective behaviors have many observable features that have their own time frames, we introduced multiple optimized time windows.
We generated affective recognition models for each time window and ensembled these models together.
- Score: 0.2216657815393579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human affective recognition is an important factor in human-computer
interaction. However, the method development with in-the-wild data is not yet
accurate enough for practical usage. In this paper, we introduce the affective
recognition method focusing on valence-arousal (VA) and expression (EXP) that
was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2020
Contest. Since we considered that affective behaviors have many observable
features that have their own time frames, we introduced multiple optimized time
windows (short-term, middle-term, and long-term) into our analyzing framework
for extracting feature parameters from video data. Moreover, multiple modality
data are used, including action units, head poses, gaze, posture, and ResNet 50
or Efficient NET features, and are optimized during the extraction of these
features. Then, we generated affective recognition models for each time window
and ensembled these models together. Also, we fussed the valence, arousal, and
expression models together to enable the multi-task learning, considering the
fact that the basic psychological states behind facial expressions are closely
related to each another. In the validation set, our model achieved a
valence-arousal score of 0.498 and a facial expression score of 0.471. These
verification results reveal that our proposed framework can improve estimation
accuracy and robustness effectively.
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - Learning Diversified Feature Representations for Facial Expression
Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures.
Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Frame-level Prediction of Facial Expressions, Valence, Arousal and
Action Units for Mobile Devices [7.056222499095849]
We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet.
Our approach may be implemented even for video analytics on mobile devices.
arXiv Detail & Related papers (2022-03-25T03:53:27Z) - A Multi-modal and Multi-task Learning Method for Action Unit and
Expression Recognition [18.478011167414223]
We introduce a multi-modal and multi-task learning method by using both visual and audio information.
We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set.
arXiv Detail & Related papers (2021-07-09T03:28:17Z) - Multi-modal Affect Analysis using standardized data within subjects in
the Wild [8.05417723395965]
We introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation.
Our proposed framework can improve estimation accuracy and robustness effectively.
arXiv Detail & Related papers (2021-07-07T04:18:28Z) - An Enhanced Adversarial Network with Combined Latent Features for
Spatio-Temporal Facial Affect Estimation in the Wild [1.3007851628964147]
This paper proposes a novel model that efficiently extracts both spatial and temporal features of the data by means of its enhanced temporal modelling based on latent features.
Our proposed model consists of three major networks, coined Generator, Discriminator, and Combiner, which are trained in an adversarial setting combined with curriculum learning to enable our adaptive attention modules.
arXiv Detail & Related papers (2021-02-18T04:10:12Z) - Modeling Score Distributions and Continuous Covariates: A Bayesian
Approach [8.772459063453285]
We develop a generative model of the match and non-match score distributions over continuous covariates.
We use mixture models to capture arbitrary distributions and local basis functions.
Three experiments demonstrate the accuracy and effectiveness of our approach.
arXiv Detail & Related papers (2020-09-21T02:41:20Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.