WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose
- URL: http://arxiv.org/abs/2005.10353v2
- Date: Tue, 22 Sep 2020 22:54:45 GMT
- Title: WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose
- Authors: Yijun Zhou, James Gregson
- Abstract summary: We present an end-to-end head-pose estimation network designed to predict Euler angles through the full range head yaws from a single RGB image.
Our network builds on multi-loss approaches with changes to loss functions and training strategies adapted to wide range estimation.
- Score: 1.8275108630751844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an end-to-end head-pose estimation network designed to predict
Euler angles through the full range head yaws from a single RGB image. Existing
methods perform well for frontal views but few target head pose from all
viewpoints. This has applications in autonomous driving and retail. Our network
builds on multi-loss approaches with changes to loss functions and training
strategies adapted to wide range estimation. Additionally, we extract ground
truth labelings of anterior views from a current panoptic dataset for the first
time. The resulting Wide Headpose Estimation Network (WHENet) is the first
fine-grained modern method applicable to the full-range of head yaws (hence
wide) yet also meets or beats state-of-the-art methods for frontal head pose
estimation. Our network is compact and efficient for mobile devices and
applications.
Related papers
- SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network.
It can perform inference at 32 FPS without requiring inputs other than the RGB image.
It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z) - FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset
Generalization [57.98448472585241]
We propose a method to train a stereo depth estimation model on the widely available pinhole data.
We show strong generalization ability of our approach on both indoor and outdoor datasets.
arXiv Detail & Related papers (2024-01-24T20:07:59Z) - Towards Robust and Unconstrained Full Range of Rotation Head Pose
Estimation [2.915868985330569]
We present a novel method for unconstrained end-to-end head pose estimation.
We propose a continuous 6D rotation matrix representation for efficient and robust direct regression.
Our method significantly outperforms other state-of-the-art methods in an efficient and robust manner.
arXiv Detail & Related papers (2023-09-14T12:17:38Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - A Simple Baseline for Direct 2D Multi-Person Head Pose Estimation with
Full-range Angles [24.04477340811483]
Existing head pose estimation (HPE) mainly focuses on single person with pre-detected frontal heads.
We argue that these single methods are fragile and inefficient for Multi-Person Head Pose Estimation (MPHPE)
In this paper, we focus on the full-range MPHPE problem, and propose a direct end-to-end simple baseline named DirectMHP.
arXiv Detail & Related papers (2023-02-02T14:08:49Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Efficient Multi-Objective Optimization for Deep Learning [2.0305676256390934]
Multi-objective optimization (MOO) is a prevalent challenge for Deep Learning.
There exists no scalable MOO solution for truly deep neural networks.
arXiv Detail & Related papers (2021-03-24T17:59:42Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.