Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement
- URL: http://arxiv.org/abs/2311.16495v2
- Date: Sat, 2 Dec 2023 06:55:54 GMT
- Title: Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement
- Authors: Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar,
Danhang Tang, Thabo Beeler, Christian Theobalt
- Abstract summary: We explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion.
This task presents significant challenges due to the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion.
We propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction.
- Score: 65.08165593201437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we explore egocentric whole-body motion capture using a single
fisheye camera, which simultaneously estimates human body and hand motion. This
task presents significant challenges due to three factors: the lack of
high-quality datasets, fisheye camera distortion, and human body
self-occlusion. To address these challenges, we propose a novel approach that
leverages FisheyeViT to extract fisheye image features, which are subsequently
converted into pixel-aligned 3D heatmap representations for 3D human body pose
prediction. For hand tracking, we incorporate dedicated hand detection and hand
pose estimation networks for regressing 3D hand poses. Finally, we develop a
diffusion-based whole-body motion prior model to refine the estimated
whole-body motion while accounting for joint uncertainties. To train these
networks, we collect a large synthetic dataset, EgoWholeBody, comprising
840,000 high-quality egocentric images captured across a diverse range of
whole-body motion sequences. Quantitative and qualitative evaluations
demonstrate the effectiveness of our method in producing high-quality
whole-body motion estimates from a single egocentric camera.
Related papers
- Bundle Adjusted Gaussian Avatars Deblurring [31.718130377229482]
We propose a 3D-aware, physics-oriented model of blur formation attributable to human movement and a 3D human motion model to clarify ambiguities found in motion-induced blurry images.
We have established benchmarks for this task through a synthetic dataset derived from existing multi-view captures, alongside a real-captured dataset acquired through a 360-degree synchronous hybrid-exposure camera system.
arXiv Detail & Related papers (2024-11-24T10:03:24Z) - FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.
We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.
We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - Scene-aware Egocentric 3D Human Pose Estimation [72.57527706631964]
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality.
Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene.
We propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints.
arXiv Detail & Related papers (2022-12-20T21:35:39Z) - Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence
Learning [70.75369367311897]
3D-aware global correspondences are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies.
An adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result.
arXiv Detail & Related papers (2022-11-25T12:16:21Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z) - UNOC: Understanding Occlusion for Embodied Presence in Virtual Reality [12.349749717823736]
In this paper, we propose a new data-driven framework for inside-out body tracking.
We first collect a large-scale motion capture dataset with both body and finger motions.
We then simulate the occlusion patterns in head-mounted camera views on the captured ground truth using a ray casting algorithm and learn a deep neural network to infer the occluded body parts.
arXiv Detail & Related papers (2020-11-12T09:31:09Z) - SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera [97.0162841635425]
We present a solution to egocentric 3D body pose estimation from monocular images captured from downward looking fish-eye cameras installed on the rim of a head mounted VR device.
This unusual viewpoint leads to images with unique visual appearance, with severe self-occlusions and perspective distortions.
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.
arXiv Detail & Related papers (2020-11-02T16:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.