6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
- URL: http://arxiv.org/abs/2407.14136v1
- Date: Fri, 19 Jul 2024 09:05:49 GMT
- Title: 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry
- Authors: Sungho Chun, Ju Yong Chang,
- Abstract summary: This study addresses the nuanced challenge of estimating head translations within the context of 6DoF head pose estimation.
We propose a novel approach called the head Translation, Rotation, and face Geometry network (TRG), which stands out for its explicit bidirectional interaction structure.
Our contributions also include the development of a strategy for estimating bounding box correction parameters and a technique for aligning landmarks to image.
- Score: 3.106167803320563
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study addresses the nuanced challenge of estimating head translations within the context of six-degrees-of-freedom (6DoF) head pose estimation, placing emphasis on this aspect over the more commonly studied head rotations. Identifying a gap in existing methodologies, we recognized the underutilized potential synergy between facial geometry and head translation. To bridge this gap, we propose a novel approach called the head Translation, Rotation, and face Geometry network (TRG), which stands out for its explicit bidirectional interaction structure. This structure has been carefully designed to leverage the complementary relationship between face geometry and head translation, marking a significant advancement in the field of head pose estimation. Our contributions also include the development of a strategy for estimating bounding box correction parameters and a technique for aligning landmarks to image. Both of these innovations demonstrate superior performance in 6DoF head pose estimation tasks. Extensive experiments conducted on ARKitFace and BIWI datasets confirm that the proposed method outperforms current state-of-the-art techniques. Codes are released at https://github.com/asw91666/TRG-Release.
Related papers
- Polar Coordinate-Based 2D Pose Prior with Neural Distance Field [0.34952465649465553]
We propose a 2D pose prior-guided refinement approach based on Neural Distance Fields (NDF)<n>We introduce a polar coordinate-based representation that explicitly incorporates joint connection lengths, enabling a more accurate correction of erroneous pose estimations.<n>Our method is evaluated on a long jump dataset, demonstrating its ability to improve 2D pose estimation across multiple pose representations.
arXiv Detail & Related papers (2025-05-06T11:31:14Z) - UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
Unseen object pose estimation methods often rely on CAD models or multiple reference views.<n>To simplify reference acquisition, we aim to estimate the unseen object's pose through a single unposed RGB-D reference image.<n>We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
arXiv Detail & Related papers (2024-11-25T05:36:00Z) - Full-range Head Pose Geometric Data Augmentations [2.8358100463599722]
Many head pose estimation (HPE) methods promise the ability to create full-range datasets.
These methods are only accurate within a range of head angles; exceeding this specific range led to significant inaccuracies.
Here, we present methods that accurately infer the correct coordinate system and Euler angles in the correct axis-sequence.
arXiv Detail & Related papers (2024-08-02T20:41:18Z) - Semi-Supervised Unconstrained Head Pose Estimation in the Wild [60.08319512840091]
We propose the first semi-supervised unconstrained head pose estimation method SemiUHPE.
Our method is based on the observation that the aspect-ratio invariant cropping of wild heads is superior to previous landmark-based affine alignment.
Our proposed method is also beneficial for solving other closely related problems, including generic object rotation regression and 3D head reconstruction.
arXiv Detail & Related papers (2024-04-03T08:01:00Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning
Decoupled Rotations on the Spherical Representations [55.25238503204253]
We propose a novel rotation estimation network, termed as VI-Net, to make the task easier.
To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution.
Experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.
arXiv Detail & Related papers (2023-08-19T05:47:53Z) - Towards Deeply Unified Depth-aware Panoptic Segmentation with
Bi-directional Guidance Learning [63.63516124646916]
We propose a deeply unified framework for depth-aware panoptic segmentation.
We propose a bi-directional guidance learning approach to facilitate cross-task feature learning.
Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.
arXiv Detail & Related papers (2023-07-27T11:28:33Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - HS-Diffusion: Semantic-Mixing Diffusion for Head Swapping [150.06405071177048]
We propose a semantic-mixing diffusion model for head swapping (HS-Diffusion)
We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator.
We construct a new image-based head swapping benchmark and design two tailor-designed metrics.
arXiv Detail & Related papers (2022-12-13T10:04:01Z) - Geo6D: Geometric Constraints Learning for 6D Pose Estimation [21.080439293774464]
We propose a novel geometric constraints learning approach called Geo6D for direct regression 6D pose estimation methods.
We show that when equipped with Geo6D, the direct 6D methods achieve state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2022-10-20T02:00:58Z) - BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation [32.49091033895255]
Bi-directional Correspondence Mapping Network (BiCo-Net) generates point clouds guided by a typical pose regression.
An ensemble of redundant pose predictions from locally matching and direct pose regression further refines final pose output against noisy observations.
arXiv Detail & Related papers (2022-05-07T03:37:33Z) - 6D Rotation Representation For Unconstrained Head Pose Estimation [2.1485350418225244]
We address the problem of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data.
This way, our method can learn the full rotation appearance which is contrary to previous approaches that restrict the pose prediction to a narrow-angle.
Experiments on the public AFLW2000 and BIWI datasets demonstrate that our proposed method significantly outperforms other state-of-the-art methods by up to 20%.
arXiv Detail & Related papers (2022-02-25T08:41:13Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D
Object Pose Estimation [71.83992173720311]
6D pose estimation from a single RGB image is a fundamental task in computer vision.
We propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner.
Our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets.
arXiv Detail & Related papers (2021-02-24T09:11:31Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - Deep Entwined Learning Head Pose and Face Alignment Inside an
Attentional Cascade with Doubly-Conditional fusion [42.50876580245864]
Head pose estimation and face alignment constitute a backbone preprocessing for many applications relying on face analysis.
We propose to entwine face alignment and head pose tasks inside an attentional cascade.
We empirically show the benefit of entwining head pose and landmark localization objectives inside our architecture.
arXiv Detail & Related papers (2020-04-14T14:42:35Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z) - HP2IFS: Head Pose estimation exploiting Partitioned Iterated Function
Systems [18.402636415604373]
Estimating the actual head orientation from 2D images is a well known problem.
We use fractal coding theory and Partitioned Iterated Systems to extract the fractal code from the input head image.
The proposed PIFS based head pose estimation method provides accurate yaw/pitch/roll angular values.
arXiv Detail & Related papers (2020-03-25T17:56:45Z) - Robust 6D Object Pose Estimation by Learning RGB-D Features [59.580366107770764]
We propose a novel discrete-continuous formulation for rotation regression to resolve this local-optimum problem.
We uniformly sample rotation anchors in SO(3), and predict a constrained deviation from each anchor to the target, as well as uncertainty scores for selecting the best prediction.
Experiments on two benchmarks: LINEMOD and YCB-Video, show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2020-02-29T06:24:55Z) - Boosting Deep Face Recognition via Disentangling Appearance and Geometry [33.196270681809395]
We propose a framework for disentangling the appearance and geometry representations in the face recognition task.
We generate geometrically identical faces by incorporating spatial transformations.
We show that the proposed approach enhances the performance of deep face recognition models.
arXiv Detail & Related papers (2020-01-13T23:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.