Monocular Real-Time Volumetric Performance Capture
- URL: http://arxiv.org/abs/2007.13988v1
- Date: Tue, 28 Jul 2020 04:45:13 GMT
- Title: Monocular Real-Time Volumetric Performance Capture
- Authors: Ruilong Li, Yuliang Xiu, Shunsuke Saito, Zeng Huang, Kyle Olszewski,
Hao Li
- Abstract summary: We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video.
Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu)
We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
- Score: 28.481131687883256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the first approach to volumetric performance capture and
novel-view rendering at real-time speed from monocular video, eliminating the
need for expensive multi-view systems or cumbersome pre-acquisition of a
personalized template model. Our system reconstructs a fully textured 3D human
from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While
PIFu achieves high-resolution reconstruction in a memory-efficient manner, its
computationally expensive inference prevents us from deploying such a system
for real-time applications. To this end, we propose a novel hierarchical
surface localization algorithm and a direct rendering method without explicitly
extracting surface meshes. By culling unnecessary regions for evaluation in a
coarse-to-fine manner, we successfully accelerate the reconstruction by two
orders of magnitude from the baseline without compromising the quality.
Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that
effectively suppresses failure modes due to the rare occurrence of challenging
examples. We adaptively update the sampling probability of the training data
based on the current reconstruction accuracy, which effectively alleviates
reconstruction artifacts. Our experiments and evaluations demonstrate the
robustness of our system to various challenging angles, illuminations, poses,
and clothing styles. We also show that our approach compares favorably with the
state-of-the-art monocular performance capture. Our proposed approach removes
the need for multi-view studio settings and enables a consumer-accessible
solution for volumetric capture.
Related papers
- DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction.
Few-shot methods often struggle with poor reconstruction quality in vast environments.
This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z) - VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors [3.523208537466128]
We present a stereo-matching method for depth estimation from high-resolution images using visual hulls as priors.
Our method uses object masks extracted from supplementary views of the scene to guide the disparity estimation, effectively reducing the search space for matches.
This approach is specifically tailored to stereo rigs in volumetric capture systems, where an accurate depth plays a key role in the downstream reconstruction task.
arXiv Detail & Related papers (2024-06-04T17:59:57Z) - VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations [25.88881764546414]
VQ-NeRF is an efficient pipeline for enhancing implicit neural representations via vector quantization.
We present an innovative multi-scale NeRF sampling scheme that concurrently optimize the NeRF model at both compressed and original scales.
We incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions.
arXiv Detail & Related papers (2023-10-23T01:41:38Z) - Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration [59.6021678234829]
We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames.
With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
arXiv Detail & Related papers (2023-07-27T13:52:42Z) - Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations [37.42624848693373]
We introduce a method that directly digests depth images for the task of high-fidelity 3D reconstruction.
A simple sampling strategy is proposed to generate highly effective training data.
Despite its simplicity, our method outperforms a range of both classical and learning-based baselines.
arXiv Detail & Related papers (2023-06-03T12:23:17Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Efficient Human Vision Inspired Action Recognition using Adaptive
Spatiotemporal Sampling [13.427887784558168]
We introduce a novel adaptive vision system for efficient action recognition processing.
Our system pre-scans the global context sampling scheme at low-resolution and decides to skip or request high-resolution features at salient regions for further processing.
We validate the system on EPIC-KENS and UCF-101 datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-the-art baselines.
arXiv Detail & Related papers (2022-07-12T01:18:58Z) - MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface
Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views.
This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints.
Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z) - Neural 3D Reconstruction in the Wild [86.6264706256377]
We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections.
We present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes.
arXiv Detail & Related papers (2022-05-25T17:59:53Z) - RISP: Rendering-Invariant State Predictor with Differentiable Simulation
and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations.
We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem.
Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.