NPC: Neural Point Characters from Video
- URL: http://arxiv.org/abs/2304.02013v2
- Date: Fri, 1 Sep 2023 04:20:25 GMT
- Title: NPC: Neural Point Characters from Video
- Authors: Shih-Yang Su, Timur Bagautdinov, Helge Rhodin
- Abstract summary: High-fidelity human 3D models can now be learned directly from videos.
Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space.
We propose a hybrid point-based representation for reconstructing animatable characters.
- Score: 21.470471345454524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-fidelity human 3D models can now be learned directly from videos,
typically by combining a template-based surface model with neural
representations. However, obtaining a template surface requires expensive
multi-view capture systems, laser scans, or strictly controlled conditions.
Previous methods avoid using a template but rely on a costly or ill-posed
mapping from observation to canonical space. We propose a hybrid point-based
representation for reconstructing animatable characters that does not require
an explicit surface model, while being generalizable to novel poses. For a
given video, our method automatically produces an explicit set of 3D points
representing approximate canonical geometry, and learns an articulated
deformation model that produces pose-dependent point transformations. The
points serve both as a scaffold for high-frequency neural features and an
anchor for efficiently mapping between observation and canonical space. We
demonstrate on established benchmarks that our representation overcomes
limitations of prior work operating in either canonical or in observation
space. Moreover, our automatic point extraction approach enables learning
models of human and animal characters alike, matching the performance of the
methods using rigged surface templates despite being more general. Project
website: https://lemonatsu.github.io/npc/
Related papers
- SHIC: Shape-Image Correspondences with no Keypoint Supervision [106.99157362200867]
Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template.
Popularised by DensePose for the analysis of humans, authors have attempted to apply the concept to more categories.
We introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories.
arXiv Detail & Related papers (2024-07-26T17:58:59Z) - Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos [26.65191922949358]
We present a method to build animatable dog avatars from monocular videos.
This is challenging as animals display a range of (unpredictable) non-rigid movements and have a variety of appearance details.
We develop an approach that links the video frames via a 4D solution that jointly solves for animal's pose variation, and its appearance.
arXiv Detail & Related papers (2024-03-25T18:41:43Z) - Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z) - Animatable Implicit Neural Representations for Creating Realistic
Avatars from Videos [63.16888987770885]
This paper addresses the challenge of reconstructing an animatable human model from a multi-view video.
We introduce a pose-driven deformation field based on the linear blend skinning algorithm.
We show that our approach significantly outperforms recent human modeling methods.
arXiv Detail & Related papers (2022-03-15T17:56:59Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - imGHUM: Implicit Generative Models of 3D Human Shape and Articulated
Pose [42.4185273307021]
We present imGHUM, the first holistic generative model of 3D human shape and articulated pose.
We model the full human body implicitly as a function zero-level-set and without the use of an explicit template mesh.
arXiv Detail & Related papers (2021-08-24T17:08:28Z) - Locally Aware Piecewise Transformation Fields for 3D Human Mesh
Registration [67.69257782645789]
We propose piecewise transformation fields that learn 3D translation vectors to map any query point in posed space to its correspond position in rest-pose space.
We show that fitting parametric models with poses by our network results in much better registration quality, especially for extreme poses.
arXiv Detail & Related papers (2021-04-16T15:16:09Z) - A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering [13.219688351773422]
We propose a test-time optimization approach for monocular motion capture that learns a volumetric body model of the user in a self-supervised manner.
Our approach is self-supervised and does not require any additional ground truth labels for appearance, pose, or 3D shape.
We demonstrate that our novel combination of a discriminative pose estimation technique with surface-free analysis-by-synthesis outperforms purely discriminative monocular pose estimation approaches.
arXiv Detail & Related papers (2021-02-11T18:58:31Z) - Combining Implicit Function Learning and Parametric Models for 3D Human
Reconstruction [123.62341095156611]
Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces.
Such features are essential in building flexible models for both computer graphics and computer vision.
We present methodology that combines detail-rich implicit functions and parametric representations.
arXiv Detail & Related papers (2020-07-22T13:46:14Z) - PolyGen: An Autoregressive Generative Model of 3D Meshes [22.860421649320287]
We present an approach which models the mesh directly using a Transformer-based architecture.
Our model can condition on a range of inputs, including object classes, voxels, and images.
We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task.
arXiv Detail & Related papers (2020-02-23T17:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.