Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection
- URL: http://arxiv.org/abs/2405.20117v1
- Date: Thu, 30 May 2024 14:54:26 GMT
- Title: Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection
- Authors: Prashanth Chandran, Gaspard Zoss, Paulo Gotardo, Derek Bradley,
- Abstract summary: We show how a combination of specific architectural modifications can improve their accuracy and temporal stability.
We analyze the use of a spatial transformer network that is trained alongside the landmark detector in an unsupervised manner.
We show that modifying the output head of the landmark predictor to infer landmarks in a canonical 3D space can further improve accuracy.
- Score: 9.633565294243173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we examine 3 important issues in the practical use of state-of-the-art facial landmark detectors and show how a combination of specific architectural modifications can directly improve their accuracy and temporal stability. First, many facial landmark detectors require face normalization as a preprocessing step, which is accomplished by a separately-trained neural network that crops and resizes the face in the input image. There is no guarantee that this pre-trained network performs the optimal face normalization for landmark detection. We instead analyze the use of a spatial transformer network that is trained alongside the landmark detector in an unsupervised manner, and jointly learn optimal face normalization and landmark detection. Second, we show that modifying the output head of the landmark predictor to infer landmarks in a canonical 3D space can further improve accuracy. To convert the predicted 3D landmarks into screen-space, we additionally predict the camera intrinsics and head pose from the input image. As a side benefit, this allows to predict the 3D face shape from a given image only using 2D landmarks as supervision, which is useful in determining landmark visibility among other things. Finally, when training a landmark detector on multiple datasets at the same time, annotation inconsistencies across datasets forces the network to produce a suboptimal average. We propose to add a semantic correction network to address this issue. This additional lightweight neural network is trained alongside the landmark detector, without requiring any additional supervision. While the insights of this paper can be applied to most common landmark detectors, we specifically target a recently-proposed continuous 2D landmark detector to demonstrate how each of our additions leads to meaningful improvements over the state-of-the-art on standard benchmarks.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection [6.096961718434965]
We study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes.
We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently.
We propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals.
arXiv Detail & Related papers (2023-04-25T17:59:54Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - Towards Accurate Facial Landmark Detection via Cascaded Transformers [14.74021483826222]
We propose an accurate facial landmark detector based on cascaded transformers.
With self-attention in transformers, our model can inherently exploit the structured relationships between landmarks.
During cascaded refinement, our model is able to extract the most relevant image features around the target landmark for coordinate prediction.
arXiv Detail & Related papers (2022-08-23T08:42:13Z) - SNAKE: Shape-aware Neural 3D Keypoint Field [62.91169625183118]
Detecting 3D keypoints from point clouds is important for shape reconstruction.
This work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?
We propose a novel unsupervised paradigm named SNAKE, which is short for shape-aware neural 3D keypoint field.
arXiv Detail & Related papers (2022-06-03T17:58:43Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor
Distance Voting [12.611269919468999]
We present a novel neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds.
Our results on the bird's eye view detection outperform the state-of-the-art performance by a large margin, especially for the hard'' level detection.
arXiv Detail & Related papers (2021-07-06T09:18:33Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z) - BRUL\`E: Barycenter-Regularized Unsupervised Landmark Extraction [2.2758845733923687]
Unsupervised retrieval of image features is vital for many computer vision tasks where the annotation is missing or scarce.
We propose a new unsupervised approach to detect the landmarks in images, validating it on the popular task of human face key-points extraction.
The method is based on the idea of auto-encoding the wanted landmarks in the latent space while discarding the non-essential information.
arXiv Detail & Related papers (2020-06-20T20:04:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.