A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering
- URL: http://arxiv.org/abs/2102.06199v1
- Date: Thu, 11 Feb 2021 18:58:31 GMT
- Title: A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering
- Authors: Shih-Yang Su, Frank Yu, Michael Zollhoefer and Helge Rhodin
- Abstract summary: We propose a test-time optimization approach for monocular motion capture that learns a volumetric body model of the user in a self-supervised manner.
Our approach is self-supervised and does not require any additional ground truth labels for appearance, pose, or 3D shape.
We demonstrate that our novel combination of a discriminative pose estimation technique with surface-free analysis-by-synthesis outperforms purely discriminative monocular pose estimation approaches.
- Score: 13.219688351773422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep learning has reshaped the classical motion capture pipeline,
generative, analysis-by-synthesis elements are still in use to recover fine
details if a high-quality 3D model of the user is available. Unfortunately,
obtaining such a model for every user a priori is challenging, time-consuming,
and limits the application scenarios. We propose a novel test-time optimization
approach for monocular motion capture that learns a volumetric body model of
the user in a self-supervised manner. To this end, our approach combines the
advantages of neural radiance fields with an articulated skeleton
representation. Our proposed skeleton embedding serves as a common reference
that links constraints across time, thereby reducing the number of required
camera views from traditionally dozens of calibrated cameras, down to a single
uncalibrated one. As a starting point, we employ the output of an off-the-shelf
model that predicts the 3D skeleton pose. The volumetric body shape and
appearance is then learned from scratch, while jointly refining the initial
pose estimate. Our approach is self-supervised and does not require any
additional ground truth labels for appearance, pose, or 3D shape. We
demonstrate that our novel combination of a discriminative pose estimation
technique with surface-free analysis-by-synthesis outperforms purely
discriminative monocular pose estimation approaches and generalizes well to
multiple views.
Related papers
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video [13.510513575340106]
Reconstructing dynamic articulated objects from a singular monocular video is challenging, requiring joint estimation of shape, motion, and camera parameters from limited views.
We propose Synergistic Shape and Skeleton Optimization (S3O), a novel two-phase method that efficiently learns parametric models including visible shapes and underlying skeletons.
Our experimental evaluations on standard benchmarks and the PlanetZoo dataset affirm that S3O provides more accurate 3D reconstruction, and plausible skeletons, and reduces the training time by approximately 60% compared to the state-of-the-art.
arXiv Detail & Related papers (2024-05-21T09:01:00Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - Personalized 3D Human Pose and Shape Refinement [19.082329060985455]
regression-based methods have dominated the field of 3D human pose and shape estimation.
We propose to construct dense correspondences between initial human model estimates and the corresponding images.
We show that our approach not only consistently leads to better image-model alignment, but also to improved 3D accuracy.
arXiv Detail & Related papers (2024-03-18T10:13:53Z) - Enhanced Spatio-Temporal Context for Temporally Consistent Robust 3D
Human Motion Recovery from Monocular Videos [5.258814754543826]
We propose a novel method for temporally consistent motion estimation from a monocular video.
Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose.
Our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2023-11-20T10:53:59Z) - NPC: Neural Point Characters from Video [21.470471345454524]
High-fidelity human 3D models can now be learned directly from videos.
Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space.
We propose a hybrid point-based representation for reconstructing animatable characters.
arXiv Detail & Related papers (2023-04-04T17:59:22Z) - SAOR: Single-View Articulated Object Reconstruction [17.2716639564414]
We introduce SAOR, a novel approach for estimating the 3D shape, texture, and viewpoint of an articulated object from a single image captured in the wild.
Unlike prior approaches that rely on pre-defined category-specific 3D templates or tailored 3D skeletons, SAOR learns to articulate shapes from single-view image collections with a skeleton-free part-based model without requiring any 3D object shape priors.
arXiv Detail & Related papers (2023-03-23T17:59:35Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.