Towards Accurate Facial Landmark Detection via Cascaded Transformers
- URL: http://arxiv.org/abs/2208.10808v1
- Date: Tue, 23 Aug 2022 08:42:13 GMT
- Title: Towards Accurate Facial Landmark Detection via Cascaded Transformers
- Authors: Hui Li, Zidong Guo, Seon-Min Rhee, Seungju Han, Jae-Joon Han
- Abstract summary: We propose an accurate facial landmark detector based on cascaded transformers.
With self-attention in transformers, our model can inherently exploit the structured relationships between landmarks.
During cascaded refinement, our model is able to extract the most relevant image features around the target landmark for coordinate prediction.
- Score: 14.74021483826222
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate facial landmarks are essential prerequisites for many tasks related
to human faces. In this paper, an accurate facial landmark detector is proposed
based on cascaded transformers. We formulate facial landmark detection as a
coordinate regression task such that the model can be trained end-to-end. With
self-attention in transformers, our model can inherently exploit the structured
relationships between landmarks, which would benefit landmark detection under
challenging conditions such as large pose and occlusion. During cascaded
refinement, our model is able to extract the most relevant image features
around the target landmark for coordinate prediction, based on deformable
attention mechanism, thus bringing more accurate alignment. In addition, we
propose a novel decoder that refines image features and landmark positions
simultaneously. With few parameter increasing, the detection performance
improves further. Our model achieves new state-of-the-art performance on
several standard facial landmark detection benchmarks, and shows good
generalization ability in cross-dataset evaluation.
Related papers
- Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection [9.633565294243173]
We show how a combination of specific architectural modifications can improve their accuracy and temporal stability.
We analyze the use of a spatial transformer network that is trained alongside the landmark detector in an unsupervised manner.
We show that modifying the output head of the landmark predictor to infer landmarks in a canonical 3D space can further improve accuracy.
arXiv Detail & Related papers (2024-05-30T14:54:26Z) - Towards Multi-domain Face Landmark Detection with Synthetic Data from
Diffusion model [27.307563102526192]
deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement.
There are still challenges in face landmark detection in other domains (e.g. cartoon, caricature, etc)
We design a two-stage training approach that effectively leverages limited datasets and the pre-trained diffusion model.
Our results demonstrate that our method outperforms existing methods on multi-domain face landmark detection.
arXiv Detail & Related papers (2024-01-24T02:35:32Z) - DeepFidelity: Perceptual Forgery Fidelity Assessment for Deepfake
Detection [67.3143177137102]
Deepfake detection refers to detecting artificially generated or edited faces in images or videos.
We propose a novel Deepfake detection framework named DeepFidelity to adaptively distinguish real and fake faces.
arXiv Detail & Related papers (2023-12-07T07:19:45Z) - Enhancing Landmark Detection in Cluttered Real-World Scenarios with
Vision Transformers [2.900522306460408]
This research contributes to the advancement of landmark detection in visual place recognition.
It shows the potential of leveraging vision transformers to overcome challenges posed by cluttered real-world scenarios.
arXiv Detail & Related papers (2023-08-25T21:01:01Z) - KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired
True-Range Multilateration [28.96448680048584]
KeyPoint Positioning System (KeyPosS) is first framework to deduce exact landmark coordinates by triangulating distances between points of interest and anchor points predicted by a fully convolutional network.
Experiments on four datasets demonstrate state-of-the-art performance, with KeyPosS outperforming existing methods in low-resolution settings despite minimal computational overhead.
arXiv Detail & Related papers (2023-05-25T19:30:21Z) - Precise Facial Landmark Detection by Reference Heatmap Transformer [52.417964103227696]
We propose a novel Reference Heatmap Transformer (RHT) for more precise facial landmark detection.
The experimental results from challenging benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art methods in the literature.
arXiv Detail & Related papers (2023-03-14T12:26:48Z) - RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark
Detection [131.1478251760399]
We formulate the facial landmark detection task as refining landmark queries along pyramid memories.
Specifically, a pyramid transformer head (PTH) is introduced to build both relations among landmarks and heterologous relations between landmarks and cross-scale contexts.
A dynamic landmark refinement (DLR) module is designed to decompose the landmark regression into an end-to-end refinement procedure.
arXiv Detail & Related papers (2022-07-08T14:12:26Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Pretrained equivariant features improve unsupervised landmark discovery [69.02115180674885]
We formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features.
Our method produces state-of-the-art results in several challenging landmark detection datasets.
arXiv Detail & Related papers (2021-04-07T05:42:11Z) - End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer.
Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image.
Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.