Related papers: Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild

URL: http://arxiv.org/abs/2003.03771v3
Date: Sat, 11 Sep 2021 04:52:46 GMT
Title: Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild
Authors: Haibo Jin, Shengcai Liao, Ling Shao
Abstract summary: We propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum.
Score: 104.61677518999976
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, heatmap regression models have become popular due to their superior performance in locating facial landmarks. However, three major problems still exist among these models: (1) they are computationally expensive; (2) they usually lack explicit constraints on global shapes; (3) domain gaps are commonly present. To address these problems, we propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection. The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps. By doing so, repeated upsampling layers are no longer necessary, enabling the inference time to be largely reduced without sacrificing model accuracy. Besides, a simple but effective neighbor regression module is proposed to enforce local constraints by fusing predictions from neighboring landmarks, which enhances the robustness of the new detection head. To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum. This training strategy is able to mine more reliable pseudo-labels from unlabeled data across domains by starting with an easier task, then gradually increasing the difficulty to provide more precise labels. Extensive experiments demonstrate the superiority of PIPNet, which obtains state-of-the-art results on three out of six popular benchmarks under the supervised setting. The results on two cross-domain test sets are also consistently improved compared to the baselines. Notably, our lightweight version of PIPNet runs at 35.7 FPS and 200 FPS on CPU and GPU, respectively, while still maintaining a competitive accuracy to state-of-the-art methods. The code of PIPNet is available at https://github.com/jhb86253817/PIPNet.

Related papers

Deep Loss Convexification for Learning Iterative Models [11.36644967267829]
Iterative methods such as iterative closest point (ICP) for point cloud registration often suffer from bad local optimality. We propose learning to form a convex landscape around each ground truth.
arXiv Detail & Related papers (2024-11-16T01:13:04Z)
POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search [34.50794776762681]
This paper introduces the Parallel Optimal Position Search (POPoS), a high-precision encoding-decoding framework. POPoS employs three key innovations: Pseudo-range multilateration is utilized to correct heatmap errors, enhancing the precision of landmark localization. A single-step parallel algorithm is introduced, significantly enhancing computational efficiency and reducing processing time.
arXiv Detail & Related papers (2024-10-12T16:28:40Z)
Unsupervised Keypoints from Pretrained Diffusion Models [31.147785019795347]
We leverage the emergent knowledge within text-to-image diffusion models, towards more robust unsupervised keypoints. Our core idea is to find text embeddings that would cause the generative model to consistently attend to compact regions in images. We validate our performance on multiple datasets: the CelebA, CUB-200-2011, Tai-Chi-HD, DeepFashion, and Human3.6m datasets.
arXiv Detail & Related papers (2023-11-29T19:43:38Z)
Deep Point Cloud Normal Estimation via Triplet Learning [12.271669779096076]
We propose a novel normal estimation method for point clouds. It consists of two phases: (a) feature encoding which learns representations of local patches, and (b) normal estimation that takes the learned representation as input and regresses the normal vector. Our method preserves sharp features and achieves better normal estimation results on CAD-like shapes.
arXiv Detail & Related papers (2021-10-20T11:16:00Z)
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework [59.578339075658995]
We propose a purely point-based framework for joint crowd counting and individual localization. We design an intuitive solution under this framework, which is called Point to Point Network (P2PNet)
arXiv Detail & Related papers (2021-07-27T11:41:50Z)
Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection. scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z)
Gaussian Vector: An Efficient Solution for Facial Landmark Detection [3.058685580689605]
This paper proposes a new solution, Gaussian Vector, to preserve the spatial information as well as reduce the output size and simplify the post-processing. We evaluate our method on 300W, COFW, WFLW and JD-landmark.
arXiv Detail & Related papers (2020-10-03T10:15:41Z)
SADet: Learning An Efficient and Accurate Pedestrian Detector [68.66857832440897]
This paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector. It forms a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images.
arXiv Detail & Related papers (2020-07-26T12:32:38Z)
Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN. In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled. In the second stage, for each guided point, different visual feature is extracted by the localization. The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z)
Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid. We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently. We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.