Semi-supervised Facial Action Unit Intensity Estimation with Contrastive
Learning
- URL: http://arxiv.org/abs/2011.01864v2
- Date: Wed, 4 Nov 2020 09:40:36 GMT
- Title: Semi-supervised Facial Action Unit Intensity Estimation with Contrastive
Learning
- Authors: Enrique Sanchez, Adrian Bulat, Anestis Zaganidis, Georgios
Tzimiropoulos
- Abstract summary: Our method does not require to manually select key frames, and produces state-of-the-art results with as little as $2%$ of annotated frames.
We experimentally validate that our method outperforms existing methods when working with as little as $2%$ of randomly chosen data.
- Score: 54.90704746573636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the challenging problem of estimating the intensity of
Facial Action Units with few labeled images. Contrary to previous works, our
method does not require to manually select key frames, and produces
state-of-the-art results with as little as $2\%$ of annotated frames, which are
\textit{randomly chosen}. To this end, we propose a semi-supervised learning
approach where a spatio-temporal model combining a feature extractor and a
temporal module are learned in two stages. The first stage uses datasets of
unlabeled videos to learn a strong spatio-temporal representation of facial
behavior dynamics based on contrastive learning. To our knowledge we are the
first to build upon this framework for modeling facial behavior in an
unsupervised manner. The second stage uses another dataset of randomly chosen
labeled frames to train a regressor on top of our spatio-temporal model for
estimating the AU intensity. We show that although backpropagation through time
is applied only with respect to the output of the network for extremely sparse
and randomly chosen labeled frames, our model can be effectively trained to
estimate AU intensity accurately, thanks to the unsupervised pre-training of
the first stage. We experimentally validate that our method outperforms
existing methods when working with as little as $2\%$ of randomly chosen data
for both DISFA and BP4D datasets, without a careful choice of labeled frames, a
time-consuming task still required in previous approaches.
Related papers
- Hybrid diffusion models: combining supervised and generative pretraining for label-efficient fine-tuning of segmentation models [55.2480439325792]
We propose a new pretext task, which is to perform simultaneously image denoising and mask prediction on the first domain.
We show that fine-tuning a model pretrained using this approach leads to better results than fine-tuning a similar model trained using either supervised or unsupervised pretraining.
arXiv Detail & Related papers (2024-08-06T20:19:06Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification [10.911464455072391]
FACTUAL is a Contrastive Learning framework for Adversarial Training and robust SAR classification.
Our model achieves 99.7% accuracy on clean samples, and 89.6% on perturbed samples, both outperforming previous state-of-the-art methods.
arXiv Detail & Related papers (2024-04-04T06:20:22Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Self-Distilled Representation Learning for Time Series [45.51976109748732]
Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision.
We propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework.
We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets.
arXiv Detail & Related papers (2023-11-19T14:34:01Z) - Semi-Supervised Learning for hyperspectral images by non parametrically
predicting view assignment [25.198550162904713]
Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images.
Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting.
In this work, we leverage the idea of semi-supervised learning to assist the discriminative self-supervised pretraining of the models.
arXiv Detail & Related papers (2023-06-19T14:13:56Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.