An Intuitive and Unconstrained 2D Cube Representation for Simultaneous
Head Detection and Pose Estimation
- URL: http://arxiv.org/abs/2212.03623v1
- Date: Wed, 7 Dec 2022 13:28:50 GMT
- Title: An Intuitive and Unconstrained 2D Cube Representation for Simultaneous
Head Detection and Pose Estimation
- Authors: Huayi Zhou, Fei Jiang, Lili Xiong, Hongtao Lu
- Abstract summary: We present a novel single-stage key-based method via an intuitive and it un 2D cube representation for joint head detection and pose estimation.
Our method achieves comparable results with other representative methods on the AFLW2000 and BIWI datasets.
- Score: 24.04477340811483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most recent head pose estimation (HPE) methods are dominated by the Euler
angle representation. To avoid its inherent ambiguity problem of rotation
labels, alternative quaternion-based and vector-based representations are
introduced. However, they both are not visually intuitive, and often derived
from equivocal Euler angle labels. In this paper, we present a novel
single-stage keypoint-based method via an {\it intuitive} and {\it
unconstrained} 2D cube representation for joint head detection and pose
estimation. The 2D cube is an orthogonal projection of the 3D regular
hexahedron label roughly surrounding one head, and itself contains the head
location. It can reflect the head orientation straightforwardly and
unambiguously in any rotation angle. Unlike the general 6-DoF object pose
estimation, our 2D cube ignores the 3-DoF of head size but retains the 3-DoF of
head pose. Based on the prior of equal side length, we can effortlessly obtain
the closed-form solution of Euler angles from predicted 2D head cube instead of
applying the error-prone PnP algorithm. In experiments, our proposed method
achieves comparable results with other representative methods on the public
AFLW2000 and BIWI datasets. Besides, a novel test on the CMU panoptic dataset
shows that our method can be seamlessly adapted to the unconstrained full-view
HPE task without modification.
Related papers
- SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception [61.7243424157871]
We introduce a transformer-based architecture that, by incorporating a novel Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$degree$ perception benchmarks for depth estimation and semantic segmentation.
arXiv Detail & Related papers (2024-12-09T20:23:10Z) - MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors [24.753860375872215]
This paper presents a Transformer-based monocular 3D object detection method called MonoDGP.
It adopts perspective-invariant geometry errors to modify the projection formula.
Our method demonstrates state-of-the-art performance on the KITTI benchmark without extra data.
arXiv Detail & Related papers (2024-10-25T14:31:43Z) - Full-range Head Pose Geometric Data Augmentations [2.8358100463599722]
Many head pose estimation (HPE) methods promise the ability to create full-range datasets.
These methods are only accurate within a range of head angles; exceeding this specific range led to significant inaccuracies.
Here, we present methods that accurately infer the correct coordinate system and Euler angles in the correct axis-sequence.
arXiv Detail & Related papers (2024-08-02T20:41:18Z) - Semi-Supervised Unconstrained Head Pose Estimation in the Wild [60.08319512840091]
We propose the first semi-supervised unconstrained head pose estimation method SemiUHPE.
Our method is based on the observation that the aspect-ratio invariant cropping of wild heads is superior to previous landmark-based affine alignment.
Our proposed method is also beneficial for solving other closely related problems, including generic object rotation regression and 3D head reconstruction.
arXiv Detail & Related papers (2024-04-03T08:01:00Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Neural Articulated Radiance Field [90.91714894044253]
We present Neural Articulated Radiance Field (NARF), a novel deformable 3D representation for articulated objects learned from images.
Experiments show that the proposed method is efficient and can generalize well to novel poses.
arXiv Detail & Related papers (2021-04-07T13:23:14Z) - A Vector-based Representation to Enhance Head Pose Estimation [4.329951775163721]
This paper proposes to use the three vectors in a rotation matrix as the representation in head pose estimation.
We develop a new neural network based on the characteristic of such representation.
arXiv Detail & Related papers (2020-10-14T15:57:29Z) - SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting
1D Occupancy Segments From 2D Coordinates [61.04823927283092]
We propose to represent 3D shapes using 2D functions, where the output of the function at each 2D location is a sequence of line segments inside the shape.
We implement this approach using a Seq2Seq model with attention, called SeqXY2SeqZ, which learns the mapping from a sequence of 2D coordinates along two arbitrary axes to a sequence of 1D locations along the third axis.
Our experiments show that SeqXY2SeqZ outperforms the state-ofthe-art methods under widely used benchmarks.
arXiv Detail & Related papers (2020-03-12T00:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.