An Intuitive and Unconstrained 2D Cube Representation for Simultaneous
Head Detection and Pose Estimation
- URL: http://arxiv.org/abs/2212.03623v1
- Date: Wed, 7 Dec 2022 13:28:50 GMT
- Title: An Intuitive and Unconstrained 2D Cube Representation for Simultaneous
Head Detection and Pose Estimation
- Authors: Huayi Zhou, Fei Jiang, Lili Xiong, Hongtao Lu
- Abstract summary: We present a novel single-stage key-based method via an intuitive and it un 2D cube representation for joint head detection and pose estimation.
Our method achieves comparable results with other representative methods on the AFLW2000 and BIWI datasets.
- Score: 24.04477340811483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most recent head pose estimation (HPE) methods are dominated by the Euler
angle representation. To avoid its inherent ambiguity problem of rotation
labels, alternative quaternion-based and vector-based representations are
introduced. However, they both are not visually intuitive, and often derived
from equivocal Euler angle labels. In this paper, we present a novel
single-stage keypoint-based method via an {\it intuitive} and {\it
unconstrained} 2D cube representation for joint head detection and pose
estimation. The 2D cube is an orthogonal projection of the 3D regular
hexahedron label roughly surrounding one head, and itself contains the head
location. It can reflect the head orientation straightforwardly and
unambiguously in any rotation angle. Unlike the general 6-DoF object pose
estimation, our 2D cube ignores the 3-DoF of head size but retains the 3-DoF of
head pose. Based on the prior of equal side length, we can effortlessly obtain
the closed-form solution of Euler angles from predicted 2D head cube instead of
applying the error-prone PnP algorithm. In experiments, our proposed method
achieves comparable results with other representative methods on the public
AFLW2000 and BIWI datasets. Besides, a novel test on the CMU panoptic dataset
shows that our method can be seamlessly adapted to the unconstrained full-view
HPE task without modification.
Related papers
- MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors [24.753860375872215]
This paper presents a Transformer-based monocular 3D object detection method called MonoDGP.
It adopts perspective-invariant geometry errors to modify the projection formula.
Our method demonstrates state-of-the-art performance on the KITTI benchmark without extra data.
arXiv Detail & Related papers (2024-10-25T14:31:43Z) - Full-range Head Pose Geometric Data Augmentations [2.8358100463599722]
Many head pose estimation (HPE) methods promise the ability to create full-range datasets.
These methods are only accurate within a range of head angles; exceeding this specific range led to significant inaccuracies.
Here, we present methods that accurately infer the correct coordinate system and Euler angles in the correct axis-sequence.
arXiv Detail & Related papers (2024-08-02T20:41:18Z) - Semi-Supervised Unconstrained Head Pose Estimation in the Wild [60.08319512840091]
We propose the first semi-supervised unconstrained head pose estimation method SemiUHPE.
Our method is based on the observation that the aspect-ratio invariant cropping of wild heads is superior to the previous landmark-based affine alignment.
Experiments and ablation studies show that SemiUHPE outperforms existing methods greatly on public benchmarks.
arXiv Detail & Related papers (2024-04-03T08:01:00Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Permutation-Invariant Relational Network for Multi-person 3D Pose
Estimation [46.38290735670527]
Recovering multi-person 3D poses from a single RGB image is a severely ill-conditioned problem.
Recent works have shown promising results by simultaneously reasoning for different people but in all cases within a local neighborhood.
PI-Net introduces a self-attention block to reason for all people in the image at the same time and refine potentially noisy initial 3D poses.
In this paper, we model people interactions at a whole, independently of their number, and in a permutation-invariant manner building upon the Set Transformer.
arXiv Detail & Related papers (2022-04-11T07:23:54Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Neural Articulated Radiance Field [90.91714894044253]
We present Neural Articulated Radiance Field (NARF), a novel deformable 3D representation for articulated objects learned from images.
Experiments show that the proposed method is efficient and can generalize well to novel poses.
arXiv Detail & Related papers (2021-04-07T13:23:14Z) - A Vector-based Representation to Enhance Head Pose Estimation [4.329951775163721]
This paper proposes to use the three vectors in a rotation matrix as the representation in head pose estimation.
We develop a new neural network based on the characteristic of such representation.
arXiv Detail & Related papers (2020-10-14T15:57:29Z) - SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting
1D Occupancy Segments From 2D Coordinates [61.04823927283092]
We propose to represent 3D shapes using 2D functions, where the output of the function at each 2D location is a sequence of line segments inside the shape.
We implement this approach using a Seq2Seq model with attention, called SeqXY2SeqZ, which learns the mapping from a sequence of 2D coordinates along two arbitrary axes to a sequence of 1D locations along the third axis.
Our experiments show that SeqXY2SeqZ outperforms the state-ofthe-art methods under widely used benchmarks.
arXiv Detail & Related papers (2020-03-12T00:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.