SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video
- URL: http://arxiv.org/abs/2201.12792v1
- Date: Sun, 30 Jan 2022 11:49:29 GMT
- Title: SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video
- Authors: Boyi Jiang, Yang Hong, Hujun Bao, Juyong Zhang
- Abstract summary: SelfRecon recovers space-time coherent geometries from a monocular self-rotating human video.
Explicit methods require a predefined template mesh for a given sequence, while the template is hard to acquire for a specific subject.
Implicit methods support arbitrary topology and have high quality due to continuous geometric representation.
- Score: 48.23424267130425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SelfRecon, a clothed human body reconstruction method that
combines implicit and explicit representations to recover space-time coherent
geometries from a monocular self-rotating human video. Explicit methods require
a predefined template mesh for a given sequence, while the template is hard to
acquire for a specific subject. Meanwhile, the fixed topology limits the
reconstruction accuracy and clothing types. Implicit methods support arbitrary
topology and have high quality due to continuous geometric representation.
However, it is difficult to integrate multi-frame information to produce a
consistent registration sequence for downstream applications. We propose to
combine the advantages of both representations. We utilize differential mask
loss of the explicit mesh to obtain the coherent overall shape, while the
details on the implicit surface are refined with the differentiable neural
rendering. Meanwhile, the explicit mesh is updated periodically to adjust its
topology changes, and a consistency loss is designed to match both
representations closely. Compared with existing methods, SelfRecon can produce
high-fidelity surfaces for arbitrary clothed humans with self-supervised
optimization. Extensive experimental results demonstrate its effectiveness on
real captured monocular videos.
Related papers
- Ultron: Enabling Temporal Geometry Compression of 3D Mesh Sequences using Temporal Correspondence and Mesh Deformation [2.0914328542137346]
Existing 3D model compression methods primarily focus on static models and do not consider inter-frame information.
This paper proposes a method to compress mesh sequences with arbitrary topology using temporal correspondence and mesh deformation.
arXiv Detail & Related papers (2024-09-08T16:34:19Z) - Learning Topology Uniformed Face Mesh by Volume Rendering for Multi-view Reconstruction [40.45683488053611]
Face meshes in consistent topology serve as the foundation for many face-related applications.
We propose a mesh volume rendering method that enables directly optimizing mesh geometry while preserving topology.
Key innovation lies in spreading sparse mesh features into the surrounding space to simulate radiance field required for volume rendering.
arXiv Detail & Related papers (2024-04-08T15:25:50Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering [15.009484906668737]
We introduce a novel technique that reconstructs mesh-based blendshape rigs from single or sparse multi-view videos.
Experiments demonstrate that, with the flexible input of single or sparse multi-view videos, we reconstruct personalized high-fidelity blendshapes.
arXiv Detail & Related papers (2024-01-16T14:41:31Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Self-supervised Cloth Reconstruction via Action-conditioned Cloth
Tracking [18.288330275993328]
We propose a self-supervised method to finetune a mesh reconstruction model in the real world.
We show that we can improve the quality of the reconstructed mesh without requiring human annotations.
arXiv Detail & Related papers (2023-02-19T07:48:12Z) - RISP: Rendering-Invariant State Predictor with Differentiable Simulation
and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Unsupervised Video Decomposition using Spatio-temporal Iterative
Inference [31.97227651679233]
Multi-object scene decomposition is a fast-emerging problem in learning.
We show that our model has a high accuracy even without color information.
We demonstrate the decomposition, segmentation prediction capabilities of our model and show that it outperforms the state-of-the-art on several benchmark datasets.
arXiv Detail & Related papers (2020-06-25T22:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.