An Efficient Multitask Neural Network for Face Alignment, Head Pose
Estimation and Face Tracking
- URL: http://arxiv.org/abs/2103.07615v1
- Date: Sat, 13 Mar 2021 04:41:15 GMT
- Title: An Efficient Multitask Neural Network for Face Alignment, Head Pose
Estimation and Face Tracking
- Authors: Jiahao Xia, Haimin Zhang, Shiping Wen, Shuo Yang and Min Xu
- Abstract summary: We propose an efficient multitask face alignment, face tracking and head pose estimation network (ATPN)
ATPN achieves improved performance compared to previous state-of-the-art methods while having less number of parameters and FLOPS.
- Score: 9.39854778804018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While convolutional neural networks (CNNs) have significantly boosted the
performance of face related algorithms, maintaining accuracy and efficiency
simultaneously in practical use remains challenging. Recent study shows that
using a cascade of hourglass modules which consist of a number of bottom-up and
top-down convolutional layers can extract facial structural information for
face alignment to improve accuracy. However, previous studies have shown that
features produced by shallow convolutional layers are highly correspond to
edges. These features could be directly used to provide the structural
information without addition cost. Motivated by this intuition, we propose an
efficient multitask face alignment, face tracking and head pose estimation
network (ATPN). Specifically, we introduce a shortcut connection between
shallow-layer features and deep-layer features to provide the structural
information for face alignment and apply the CoordConv to the last few layers
to provide coordinate information. The predicted facial landmarks enable us to
generate a cheap heatmap which contains both geometric and appearance
information for head pose estimation and it also provides attention clues for
face tracking. Moreover, the face tracking task saves us the face detection
procedure for each frame, which is significant to boost performance for
video-based tasks. The proposed framework is evaluated on four benchmark
datasets, WFLW, 300VW, WIDER Face and 300W-LP. The experimental results show
that the ATPN achieves improved performance compared to previous
state-of-the-art methods while having less number of parameters and FLOPS.
Related papers
- Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image [87.00660347447494]
Recent advancements in Neural Surface Reconstruction (NSR) have significantly improved multi-view reconstruction when coupled with volume rendering.
We propose an investigation into feature-level consistent loss, aiming to harness valuable feature priors from diverse pretext visual tasks.
Our results, analyzed on DTU and EPFL, reveal that feature priors from image matching and multi-view stereo datasets outperform other pretext tasks.
arXiv Detail & Related papers (2024-08-04T16:09:46Z) - Faceptor: A Generalist Model for Face Perception [52.8066001012464]
Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture.
Layer-Attention into Faceptor enables the model to adaptively select features from optimal layers to perform the desired tasks.
Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition.
arXiv Detail & Related papers (2024-03-14T15:42:31Z) - Implicit Shape and Appearance Priors for Few-Shot Full Head
Reconstruction [17.254539604491303]
In this paper, we address the problem of few-shot full 3D head reconstruction.
We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations.
We extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks.
arXiv Detail & Related papers (2023-10-12T07:35:30Z) - 3D Face Alignment Through Fusion of Head Pose Information and Features [0.6526824510982799]
We propose a novel method that employs head pose information to improve face alignment performance.
The proposed network structure performs robust face alignment through a dual-dimensional network.
We experimentally assessed the correlation between the predicted facial landmarks and head pose information, as well as variations in the accuracy of facial landmarks.
arXiv Detail & Related papers (2023-08-25T12:01:24Z) - Neural Point-based Volumetric Avatar: Surface-guided Neural Points for
Efficient and Photorealistic Volumetric Head Avatar [62.87222308616711]
We propose fullname (name), a method that adopts the neural point representation and the neural volume rendering process.
Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map.
By design, our name is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars.
arXiv Detail & Related papers (2023-07-11T03:40:10Z) - EfficientFace: An Efficient Deep Network with Feature Enhancement for
Accurate Face Detection [20.779512288834315]
Current lightweight CNN-based face detectors trading accuracy for efficiency have inadequate capability in handling insufficient feature representation.
We design an efficient deep face detector termed EfficientFace in this study, which contains three modules for feature enhancement.
We have evaluated EfficientFace on four public benchmarks and experimental results demonstrate the appealing performance of our method.
arXiv Detail & Related papers (2023-02-23T06:59:45Z) - EResFD: Rediscovery of the Effectiveness of Standard Convolution for
Lightweight Face Detection [13.357235715178584]
We re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture for face detection.
We show that heavily channel-pruned standard convolution layers can achieve better accuracy and inference speed.
Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference on CPU.
arXiv Detail & Related papers (2022-04-04T02:30:43Z) - A Deeper Look into DeepCap [96.67706102518238]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2021-11-20T11:34:33Z) - SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation
Network [5.353718408751182]
We propose a solution to tackle the panoptic segmentation task.
The structure combines the bottom-up method and the top-down method.
The network mainly pays attention to the quality of the mask.
arXiv Detail & Related papers (2021-07-11T17:20:32Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z) - Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing [61.82466976737915]
Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing.
We propose a new approach to detect presentation attacks from multiple frames based on two insights.
The proposed approach achieves state-of-the-art results on five benchmark datasets.
arXiv Detail & Related papers (2020-03-18T06:11:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.