RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark
Detection
- URL: http://arxiv.org/abs/2207.03917v1
- Date: Fri, 8 Jul 2022 14:12:26 GMT
- Title: RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark
Detection
- Authors: Jinpeng Li, Haibo Jin, Shengcai Liao, Ling Shao, Pheng-Ann Heng
- Abstract summary: We formulate the facial landmark detection task as refining landmark queries along pyramid memories.
Specifically, a pyramid transformer head (PTH) is introduced to build both relations among landmarks and heterologous relations between landmarks and cross-scale contexts.
A dynamic landmark refinement (DLR) module is designed to decompose the landmark regression into an end-to-end refinement procedure.
- Score: 131.1478251760399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a Refinement Pyramid Transformer (RePFormer) for robust
facial landmark detection. Most facial landmark detectors focus on learning
representative image features. However, these CNN-based feature representations
are not robust enough to handle complex real-world scenarios due to ignoring
the internal structure of landmarks, as well as the relations between landmarks
and context. In this work, we formulate the facial landmark detection task as
refining landmark queries along pyramid memories. Specifically, a pyramid
transformer head (PTH) is introduced to build both homologous relations among
landmarks and heterologous relations between landmarks and cross-scale
contexts. Besides, a dynamic landmark refinement (DLR) module is designed to
decompose the landmark regression into an end-to-end refinement procedure,
where the dynamically aggregated queries are transformed to residual
coordinates predictions. Extensive experimental results on four facial landmark
detection benchmarks and their various subsets demonstrate the superior
performance and high robustness of our framework.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Feature Shrinkage Pyramid for Camouflaged Object Detection with
Transformers [34.42710399235461]
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection.
They suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders.
We propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features.
arXiv Detail & Related papers (2023-03-26T20:50:58Z) - Precise Facial Landmark Detection by Reference Heatmap Transformer [52.417964103227696]
We propose a novel Reference Heatmap Transformer (RHT) for more precise facial landmark detection.
The experimental results from challenging benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art methods in the literature.
arXiv Detail & Related papers (2023-03-14T12:26:48Z) - Towards Accurate Facial Landmark Detection via Cascaded Transformers [14.74021483826222]
We propose an accurate facial landmark detector based on cascaded transformers.
With self-attention in transformers, our model can inherently exploit the structured relationships between landmarks.
During cascaded refinement, our model is able to extract the most relevant image features around the target landmark for coordinate prediction.
arXiv Detail & Related papers (2022-08-23T08:42:13Z) - Sparse Local Patch Transformer for Robust Face Alignment and Landmarks
Inherent Relation Learning [11.150290581561725]
We propose a Sparse Local Patch Transformer (S) for learning the inherent relation.
The proposed method works at the state-of-the-art level with much less computational complexity.
arXiv Detail & Related papers (2022-03-13T01:15:23Z) - Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model
Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective.
We reconstruct an interactive scene using RGB-D data stream.
This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z) - Deep Structured Prediction for Facial Landmark Detection [59.60946775628646]
This paper proposes a method for deep structured facial landmark detection based on combining a deep Convolutional Network with a Conditional Random Field.
We demonstrate its superior performance to existing state-of-the-art techniques in facial landmark detection.
arXiv Detail & Related papers (2020-10-18T17:09:24Z) - Feature Pyramid Grids [140.11116687047058]
We present Feature Pyramid Grids (FPG), a deep multi-pathway feature pyramid.
FPG can improve single-pathway feature pyramid networks by significantly increasing its performance at similar computation cost.
arXiv Detail & Related papers (2020-04-07T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.