SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from
Video
- URL: http://arxiv.org/abs/2210.11341v1
- Date: Thu, 20 Oct 2022 15:21:51 GMT
- Title: SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from
Video
- Authors: Marija Jegorova, Stavros Petridis, Maja Pantic
- Abstract summary: This work focuses on the apparent emotional reaction recognition from the video-only input, conducted in a self-supervised fashion.
The network is first pre-trained on different self-supervised pretext tasks and later fine-tuned on the downstream target task.
- Score: 61.21388780334379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work focuses on the apparent emotional reaction recognition (AERR) from
the video-only input, conducted in a self-supervised fashion. The network is
first pre-trained on different self-supervised pretext tasks and later
fine-tuned on the downstream target task. Self-supervised learning facilitates
the use of pre-trained architectures and larger datasets that might be deemed
unfit for the target task and yet might be useful to learn informative
representations and hence provide useful initializations for further
fine-tuning on smaller more suitable data. Our presented contribution is
two-fold: (1) an analysis of different state-of-the-art (SOTA) pretext tasks
for the video-only apparent emotional reaction recognition architecture, and
(2) an analysis of various combinations of the regression and classification
losses that are likely to improve the performance further. Together these two
contributions result in the current state-of-the-art performance for the
video-only spontaneous apparent emotional reaction recognition with continuous
annotations.
Related papers
- SVFAP: Self-supervised Video Facial Affect Perceiver [42.16505961654868]
Motivated by the recent success of self-supervised learning in computer vision, this paper introduces a self-supervised approach, termed Self-supervised Video Facial Affect Perceiver (SVFAP)
To address the dilemma faced by supervised methods, SVFAP leverages masked video autoencoding to perform self-supervised pre-training on massive unlabeled facial videos.
To verify the effectiveness of our method, we conduct experiments on nine datasets spanning three downstream tasks, including dynamic facial expression recognition, dimensional emotion recognition, and personality recognition.
arXiv Detail & Related papers (2023-12-31T07:44:05Z) - No More Shortcuts: Realizing the Potential of Temporal Self-Supervision [69.59938105887538]
We propose a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks.
We demonstrate experimentally that our more challenging frame-level task formulations and the removal of shortcuts drastically improve the quality of features learned through temporal self-supervision.
arXiv Detail & Related papers (2023-12-20T13:20:31Z) - Self-supervised Spatiotemporal Representation Learning by Exploiting
Video Continuity [15.429045937335236]
This work exploits an essential yet under-explored property of videos, the textitvideo continuity, to obtain supervision signals for self-supervised representation learning.
We formulate three novel continuity-related pretext tasks, i.e. continuity justification, discontinuity localization, and missing section approximation.
This self-supervision approach, termed as Continuity Perception Network (CPNet), solves the three tasks altogether and encourages the backbone network to learn local and long-ranged motion and context representations.
arXiv Detail & Related papers (2021-12-11T00:35:27Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z) - Memory-augmented Dense Predictive Coding for Video Representation
Learning [103.69904379356413]
We propose a new architecture and learning framework Memory-augmented Predictive Coding (MemDPC) for the task.
We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both.
In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.
arXiv Detail & Related papers (2020-08-03T17:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.