Location-guided Head Pose Estimation for Fisheye Image
- URL: http://arxiv.org/abs/2402.18320v2
- Date: Wed, 10 Apr 2024 15:09:22 GMT
- Title: Location-guided Head Pose Estimation for Fisheye Image
- Authors: Bing Li, Dong Zhang, Cheng Huang, Yun Xian, Ming Li, Dah-Jye Lee,
- Abstract summary: Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection.
Fisheye lens distortion in the peripheral region of the image leads to degraded performance of existing head pose estimation models.
This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion.
- Score: 15.22663220816984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multi-task learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created a fisheye-distorted version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experiments results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.
Related papers
- FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.
We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.
We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement [65.08165593201437]
We explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion.
This task presents significant challenges due to the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion.
We propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction.
arXiv Detail & Related papers (2023-11-28T07:13:47Z) - An Effective Deep Network for Head Pose Estimation without Keypoints [0.0]
We propose a lightweight model that effectively addresses the head pose estimation problem.
Our proposed model significantly improves the accuracy in comparison with the state-of-the-art head pose estimation methods.
Our model has the real-time speed of $sim$300 FPS when inferring on Tesla V100.
arXiv Detail & Related papers (2022-10-25T01:57:04Z) - TriHorn-Net: A Model for Accurate Depth-Based 3D Hand Pose Estimation [8.946655323517092]
TriHorn-Net is a novel model that uses specific innovations to improve hand pose estimation accuracy on depth images.
The first innovation is the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space.
The second innovation is PixDropout, which is, to the best of our knowledge, the first appearance-based data augmentation method for hand depth images.
arXiv Detail & Related papers (2022-06-14T19:08:42Z) - FisheyeEX: Polar Outpainting for Extending the FoV of Fisheye Lens [84.12722334460022]
Fisheye lens gains increasing applications in computational photography and assisted driving because of its wide field of view (FoV)
In this paper, we present a FisheyeEX method that extends the FoV of the fisheye lens by outpainting the invalid regions.
The results demonstrate that our approach significantly outperforms the state-of-the-art methods, gaining around 27% more content beyond the original fisheye image.
arXiv Detail & Related papers (2022-06-12T21:38:50Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose [1.8275108630751844]
We present an end-to-end head-pose estimation network designed to predict Euler angles through the full range head yaws from a single RGB image.
Our network builds on multi-loss approaches with changes to loss functions and training strategies adapted to wide range estimation.
arXiv Detail & Related papers (2020-05-20T20:53:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.