Multi-task head pose estimation in-the-wild
- URL: http://arxiv.org/abs/2202.02299v1
- Date: Fri, 4 Feb 2022 18:35:52 GMT
- Title: Multi-task head pose estimation in-the-wild
- Authors: Roberto Valle, Jos\'e Miguel Buenaposada and Luis Baumela
- Abstract summary: We present a deep learning-based multi-task approach for head pose estimation in images.
We harness the strong dependencies among face pose, alignment and visibility to produce a top performing model for all three tasks.
- Score: 7.476901945542385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a deep learning-based multi-task approach for head pose estimation
in images. We contribute with a network architecture and training strategy that
harness the strong dependencies among face pose, alignment and visibility, to
produce a top performing model for all three tasks. Our architecture is an
encoder-decoder CNN with residual blocks and lateral skip connections. We show
that the combination of head pose estimation and landmark-based face alignment
significantly improve the performance of the former task. Further, the location
of the pose task at the bottleneck layer, at the end of the encoder, and that
of tasks depending on spatial information, such as visibility and alignment, in
the final decoder layer, also contribute to increase the final performance. In
the experiments conducted the proposed model outperforms the state-of-the-art
in the face pose and visibility tasks. By including a final landmark regression
step it also produces face alignment results on par with the state-of-the-art.
Related papers
- FaceXFormer: A Unified Transformer for Facial Analysis [59.94066615853198]
FaceXformer is an end-to-end unified transformer model for a range of facial analysis tasks.
Our model effectively handles images "in-the-wild," demonstrating its robustness and generalizability across eight different tasks.
arXiv Detail & Related papers (2024-03-19T17:58:04Z) - Faceptor: A Generalist Model for Face Perception [52.8066001012464]
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
arXiv Detail & Related papers (2024-03-14T15:42:31Z) - Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.
Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z) - Multi-task Learning with 3D-Aware Regularization [55.97507478913053]
We propose a structured 3D-aware regularizer which interfaces multiple tasks through the projection of features extracted from an image encoder to a shared 3D feature space.
We show that the proposed method is architecture agnostic and can be plugged into various prior multi-task backbones to improve their performance.
arXiv Detail & Related papers (2023-10-02T08:49:56Z) - Weakly-supervised 3D Pose Transfer with Keypoints [57.66991032263699]
Main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies.
We propose a novel weakly-supervised keypoint-based framework to overcome these difficulties.
arXiv Detail & Related papers (2023-07-25T12:40:24Z) - A Deeper Look into DeepCap [96.67706102518238]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2021-11-20T11:34:33Z) - Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation [1.1501261942096426]
We introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation.
Our model is able to capture the long-range dependencies between body joints.
Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-11-01T13:48:55Z) - An Efficient Multitask Neural Network for Face Alignment, Head Pose
Estimation and Face Tracking [9.39854778804018]
We propose an efficient multitask face alignment, face tracking and head pose estimation network (ATPN)
ATPN achieves improved performance compared to previous state-of-the-art methods while having less number of parameters and FLOPS.
arXiv Detail & Related papers (2021-03-13T04:41:15Z) - Deep Entwined Learning Head Pose and Face Alignment Inside an
Attentional Cascade with Doubly-Conditional fusion [42.50876580245864]
Head pose estimation and face alignment constitute a backbone preprocessing for many applications relying on face analysis.
We propose to entwine face alignment and head pose tasks inside an attentional cascade.
We empirically show the benefit of entwining head pose and landmark localization objectives inside our architecture.
arXiv Detail & Related papers (2020-04-14T14:42:35Z) - DeepCap: Monocular Human Performance Capture Using Weak Supervision [106.50649929342576]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2020-03-18T16:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.