Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images
- URL: http://arxiv.org/abs/2506.19747v1
- Date: Tue, 24 Jun 2025 16:05:36 GMT
- Title: Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images
- Authors: Stephanie Käs, Sven Peter, Henrik Thillmann, Anton Burenko, David Benjamin Adrian, Dennis Mack, Timm Linder, Bastian Leibe,
- Abstract summary: We evaluate the impact of pinhole, equidistant and double sphere camera models, as well as cylindrical projection methods, on 3D human pose estimation accuracy.<n>We find that in close-up scenarios, pinhole projection is inadequate, and the optimal projection method varies with the FOV covered by the human pose.<n>We propose a for selecting the appropriate projection model based on the detection bounding box to enhance prediction quality.
- Score: 8.719294151596705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fisheye cameras offer robots the ability to capture human movements across a wider field of view (FOV) than standard pinhole cameras, making them particularly useful for applications in human-robot interaction and automotive contexts. However, accurately detecting human poses in fisheye images is challenging due to the curved distortions inherent to fisheye optics. While various methods for undistorting fisheye images have been proposed, their effectiveness and limitations for poses that cover a wide FOV has not been systematically evaluated in the context of absolute human pose estimation from monocular fisheye images. To address this gap, we evaluate the impact of pinhole, equidistant and double sphere camera models, as well as cylindrical projection methods, on 3D human pose estimation accuracy. We find that in close-up scenarios, pinhole projection is inadequate, and the optimal projection method varies with the FOV covered by the human pose. The usage of advanced fisheye models like the double sphere model significantly enhances 3D human pose estimation accuracy. We propose a heuristic for selecting the appropriate projection model based on the detection bounding box to enhance prediction quality. Additionally, we introduce and evaluate on our novel dataset FISHnCHIPS, which features 3D human skeleton annotations in fisheye images, including images from unconventional angles, such as extreme close-ups, ground-mounted cameras, and wide-FOV poses, available at: https://www.vision.rwth-aachen.de/fishnchips
Related papers
- Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision [0.3387808070669509]
Egocentric human body estimation allows for the inference of user body pose and shape from a wearable camera's first-person perspective.<n>We introduce Fish2Mesh, a fisheye-aware transformer-based model designed for 3D egocentric human mesh recovery.<n>Our experiments demonstrate that Fish2Mesh outperforms previous state-of-the-art 3D HMR models.
arXiv Detail & Related papers (2025-03-08T06:34:49Z) - FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.<n>We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.<n>We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - Location-guided Head Pose Estimation for Fisheye Image [15.22663220816984]
Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection.
Fisheye lens distortion in the peripheral region of the image leads to degraded performance of existing head pose estimation models.
This paper presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion.
arXiv Detail & Related papers (2024-02-28T13:33:43Z) - NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images [1.86413150130483]
Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain.
We leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets.
arXiv Detail & Related papers (2024-02-28T09:36:22Z) - Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement [65.08165593201437]
We explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion.
This task presents significant challenges due to the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion.
We propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction.
arXiv Detail & Related papers (2023-11-28T07:13:47Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense
Contact Guidance [82.09463058198546]
Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation.
We propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry.
arXiv Detail & Related papers (2022-05-11T17:59:31Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.