ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average
Texture and Mesh Encoding
- URL: http://arxiv.org/abs/2309.12183v1
- Date: Thu, 21 Sep 2023 15:50:04 GMT
- Title: ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average
Texture and Mesh Encoding
- Authors: Yu Cheng, Bo Wang, Robby T. Tan
- Abstract summary: In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion.
We introduce ORTexME, an occlusion-robust temporal method that utilizes temporal information from the input video to better regularize the occluded body parts.
Our method achieves significant improvement on the challenging multi-person 3DPW dataset, where our method achieves 1.8 P-MPJPE error reduction.
- Score: 35.49066795648395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In 3D human shape and pose estimation from a monocular video, models trained
with limited labeled data cannot generalize well to videos with occlusion,
which is common in the wild videos. The recent human neural rendering
approaches focusing on novel view synthesis initialized by the off-the-shelf
human shape and pose methods have the potential to correct the initial human
shape. However, the existing methods have some drawbacks such as, erroneous in
handling occlusion, sensitive to inaccurate human segmentation, and ineffective
loss computation due to the non-regularized opacity field. To address these
problems, we introduce ORTexME, an occlusion-robust temporal method that
utilizes temporal information from the input video to better regularize the
occluded body parts. While our ORTexME is based on NeRF, to determine the
reliable regions for the NeRF ray sampling, we utilize our novel average
texture learning approach to learn the average appearance of a person, and to
infer a mask based on the average texture. In addition, to guide the
opacity-field updates in NeRF to suppress blur and noise, we propose the use of
human body mesh. The quantitative evaluation demonstrates that our method
achieves significant improvement on the challenging multi-person 3DPW dataset,
where our method achieves 1.8 P-MPJPE error reduction. The SOTA rendering-based
methods fail and enlarge the error up to 5.6 on the same dataset.
Related papers
- Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion [13.938406073551844]
This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation.
To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views.
Our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements.
arXiv Detail & Related papers (2024-10-06T18:15:27Z) - DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery.
It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z) - Direct Dense Pose Estimation [138.56533828316833]
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies.
Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person.
We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
arXiv Detail & Related papers (2022-04-04T06:14:38Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Occluded Human Mesh Recovery [23.63235079216075]
We present Occluded Human Mesh Recovery (OCHMR) - a novel top-down mesh recovery approach that incorporates image spatial context.
OCHMR achieves superior performance on challenging multi-person benchmarks like 3DPW, CrowdPose and OCHuman.
arXiv Detail & Related papers (2022-03-24T21:39:20Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.