SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras
- URL: http://arxiv.org/abs/2401.14785v1
- Date: Fri, 26 Jan 2024 11:19:13 GMT
- Title: SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras
- Authors: Hanz Cuevas-Velasquez, Charlie Hewitt, Sadegh Aliakbarian, Tadas
Baltru\v{s}aitis
- Abstract summary: Egocentric human pose estimation is difficult from downwards-facing cameras on head-mounted devices (HMDs)
Previous solutions minimize this problem by using fish-eye camera lenses to capture a wider view, but these can present hardware design issues.
We predict pose from images captured with conventional rectilinear camera lenses. This resolves hardware design issues, but means body parts are often out of frame.
Our approach achieves state-of-the-art results for this challenging configuration, reducing mean per-joint position error by 23% overall and 58% for the lower body.
- Score: 6.476948781728137
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Our work addresses the problem of egocentric human pose estimation from
downwards-facing cameras on head-mounted devices (HMD). This presents a
challenging scenario, as parts of the body often fall outside of the image or
are occluded. Previous solutions minimize this problem by using fish-eye camera
lenses to capture a wider view, but these can present hardware design issues.
They also predict 2D heat-maps per joint and lift them to 3D space to deal with
self-occlusions, but this requires large network architectures which are
impractical to deploy on resource-constrained HMDs. We predict pose from images
captured with conventional rectilinear camera lenses. This resolves hardware
design issues, but means body parts are often out of frame. As such, we
directly regress probabilistic joint rotations represented as matrix Fisher
distributions for a parameterized body model. This allows us to quantify pose
uncertainties and explain out-of-frame or occluded joints. This also removes
the need to compute 2D heat-maps and allows for simplified DNN architectures
which require less compute. Given the lack of egocentric datasets using
rectilinear camera lenses, we introduce the SynthEgo dataset, a synthetic
dataset with 60K stereo images containing high diversity of pose, shape,
clothing and skin tone. Our approach achieves state-of-the-art results for this
challenging configuration, reducing mean per-joint position error by 23%
overall and 58% for the lower body. Our architecture also has eight times fewer
parameters and runs twice as fast as the current state-of-the-art. Experiments
show that training on our synthetic dataset leads to good generalization to
real world images without fine-tuning.
Related papers
- Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot [22.848563931757962]
We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image.
Predictions encompass the whole body, including hands and facial expressions, using the SMPL-X parametric model.
We show that incorporating it into the training data further enhances predictions, particularly for hands.
arXiv Detail & Related papers (2024-02-22T16:05:13Z) - Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views [9.476008200056082]
Ego3DPose is a highly accurate binocular egocentric 3D pose reconstruction system.
We propose a two-path network architecture with a path that estimates pose per limb independently with its binocular heatmaps.
We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs.
arXiv Detail & Related papers (2023-09-21T10:34:35Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh
Reconstruction [66.10717041384625]
Zolly is the first 3DHMR method focusing on perspective-distorted images.
We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body.
We extend two real-world datasets tailored for this task, all containing perspective-distorted human images.
arXiv Detail & Related papers (2023-03-24T04:22:41Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - SPEC: Seeing People in the Wild with an Estimated Camera [64.85791231401684]
We introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image.
We train a neural network to estimate the field of view, camera pitch, and roll an input image.
We then train a novel network that rolls the camera calibration to the image features and uses these together to regress 3D body shape and pose.
arXiv Detail & Related papers (2021-10-01T19:05:18Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.