A Simple Baseline for Direct 2D Multi-Person Head Pose Estimation with
Full-range Angles
- URL: http://arxiv.org/abs/2302.01110v1
- Date: Thu, 2 Feb 2023 14:08:49 GMT
- Title: A Simple Baseline for Direct 2D Multi-Person Head Pose Estimation with
Full-range Angles
- Authors: Huayi Zhou, Fei Jiang, and Hongtao Lu
- Abstract summary: Existing head pose estimation (HPE) mainly focuses on single person with pre-detected frontal heads.
We argue that these single methods are fragile and inefficient for Multi-Person Head Pose Estimation (MPHPE)
In this paper, we focus on the full-range MPHPE problem, and propose a direct end-to-end simple baseline named DirectMHP.
- Score: 24.04477340811483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing head pose estimation (HPE) mainly focuses on single person with
pre-detected frontal heads, which limits their applications in real complex
scenarios with multi-persons. We argue that these single HPE methods are
fragile and inefficient for Multi-Person Head Pose Estimation (MPHPE) since
they rely on the separately trained face detector that cannot generalize well
to full viewpoints, especially for heads with invisible face areas. In this
paper, we focus on the full-range MPHPE problem, and propose a direct
end-to-end simple baseline named DirectMHP. Due to the lack of datasets
applicable to the full-range MPHPE, we firstly construct two benchmarks by
extracting ground-truth labels for head detection and head orientation from
public datasets AGORA and CMU Panoptic. They are rather challenging for having
many truncated, occluded, tiny and unevenly illuminated human heads. Then, we
design a novel end-to-end trainable one-stage network architecture by joint
regressing locations and orientations of multi-head to address the MPHPE
problem. Specifically, we regard pose as an auxiliary attribute of the head,
and append it after the traditional object prediction. Arbitrary pose
representation such as Euler angles is acceptable by this flexible design.
Then, we jointly optimize these two tasks by sharing features and utilizing
appropriate multiple losses. In this way, our method can implicitly benefit
from more surroundings to improve HPE accuracy while maintaining head detection
performance. We present comprehensive comparisons with state-of-the-art single
HPE methods on public benchmarks, as well as superior baseline results on our
constructed MPHPE datasets. Datasets and code are released in
https://github.com/hnuzhy/DirectMHP.
Related papers
- EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs [17.864281586189392]
Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications.
Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals.
We propose EMHI, a multimodal textbfEgocentric human textbfMotion dataset with textbfHead-Mounted Display (HMD) and body-worn textbfIMUs.
arXiv Detail & Related papers (2024-08-30T10:12:13Z) - Semi-Supervised Unconstrained Head Pose Estimation in the Wild [60.08319512840091]
We propose the first semi-supervised unconstrained head pose estimation method SemiUHPE.
Our method is based on the observation that the aspect-ratio invariant cropping of wild heads is superior to the previous landmark-based affine alignment.
Experiments and ablation studies show that SemiUHPE outperforms existing methods greatly on public benchmarks.
arXiv Detail & Related papers (2024-04-03T08:01:00Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - HiFi: High-Information Attention Heads Hold for Parameter-Efficient
Model Adaptation [0.8409934249521909]
We propose a parameter-efficient fine-tuning method HiFi, that is, only the highly informative and strongly correlated attention heads for the specific task are fine-tuned.
We first model the relationship between heads into a graph from two perspectives of information richness and correlation, and then apply PageRank algorithm to determine the relative importance of each head.
Experiments on the GLUE benchmark demonstrate the effectiveness of our method, and show that HiFi obtains state-of-the-art performance over the prior baselines.
arXiv Detail & Related papers (2023-05-08T09:31:13Z) - HS-Diffusion: Semantic-Mixing Diffusion for Head Swapping [150.06405071177048]
We propose a semantic-mixing diffusion model for head swapping (HS-Diffusion)
We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator.
We construct a new image-based head swapping benchmark and design two tailor-designed metrics.
arXiv Detail & Related papers (2022-12-13T10:04:01Z) - End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh
Reconstruction from a Single RGB Image [9.238322841389994]
We propose a single-stage pipeline for multi-hand reconstruction.
Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture.
Our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners.
arXiv Detail & Related papers (2022-04-18T03:57:14Z) - UET-Headpose: A sensor-based top-view head pose dataset [0.0]
We introduce a new approach with efficient cost and easy setup to collecting head pose images.
This method uses an absolute orientation sensor instead of Depth cameras to be set up quickly and small cost.
We also introduce the full-range model called FSANet-Wide, which significantly outperforms head pose estimation results by the UET-Headpose dataset.
arXiv Detail & Related papers (2021-11-13T04:54:20Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z) - WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose [1.8275108630751844]
We present an end-to-end head-pose estimation network designed to predict Euler angles through the full range head yaws from a single RGB image.
Our network builds on multi-loss approaches with changes to loss functions and training strategies adapted to wide range estimation.
arXiv Detail & Related papers (2020-05-20T20:53:01Z) - A Mixture of $h-1$ Heads is Better than $h$ Heads [63.12336930345417]
We propose the mixture of attentive experts model (MAE)
Experiments on machine translation and language modeling show that MAE outperforms strong baselines on both tasks.
Our analysis shows that our model learns to specialize different experts to different inputs.
arXiv Detail & Related papers (2020-05-13T19:05:58Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.