Related papers: Monocular Real-Time Volumetric Performance Capture

Monocular Real-Time Volumetric Performance Capture

URL: http://arxiv.org/abs/2007.13988v1
Date: Tue, 28 Jul 2020 04:45:13 GMT
Title: Monocular Real-Time Volumetric Performance Capture
Authors: Ruilong Li, Yuliang Xiu, Shunsuke Saito, Zeng Huang, Kyle Olszewski, Hao Li
Abstract summary: We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu) We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
Score: 28.481131687883256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolution reconstruction in a memory-efficient manner, its computationally expensive inference prevents us from deploying such a system for real-time applications. To this end, we propose a novel hierarchical surface localization algorithm and a direct rendering method without explicitly extracting surface meshes. By culling unnecessary regions for evaluation in a coarse-to-fine manner, we successfully accelerate the reconstruction by two orders of magnitude from the baseline without compromising the quality. Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts. Our experiments and evaluations demonstrate the robustness of our system to various challenging angles, illuminations, poses, and clothing styles. We also show that our approach compares favorably with the state-of-the-art monocular performance capture. Our proposed approach removes the need for multi-view studio settings and enables a consumer-accessible solution for volumetric capture.

Related papers

High-Fidelity and Generalizable Neural Surface Reconstruction with Sparse Feature Volumes [50.83282258807327]
Generalizable neural surface reconstruction has become a compelling technique to reconstruct from few images without per-scene optimization.<n>We present a sparse representation method, that maximizes memory efficiency and enables significantly higher resolution reconstructions on standard hardware.
arXiv Detail & Related papers (2025-07-08T12:50:39Z)
Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z)
HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes [13.214165748862815]
3DGS methods suffer from issues such as excessive memory consumption, slow training times, prolonged partitioning processes, and significant degradation in rendering quality due to the increased data volume.<n>We introduce textbfHUG, a novel approach that enhances data partitioning and reconstruction quality by leveraging a hierarchical neural Gaussian representation.<n>Our method achieves state-of-the-art results on one synthetic dataset and four real-world datasets.
arXiv Detail & Related papers (2025-04-23T10:40:40Z)
Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction [48.30341580103962]
We propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues. We design a deep unfolding network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction. Experiments conducted on the fastMRI and IXI datasets demonstrate that our method significantly outperforms state-of-the-art approaches in terms of reconstruction performance.
arXiv Detail & Related papers (2025-01-07T12:29:32Z)
DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction. Few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z)
VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors [3.523208537466128]
We present a stereo-matching method for depth estimation from high-resolution images using visual hulls as priors. Our method uses object masks extracted from supplementary views of the scene to guide the disparity estimation, effectively reducing the search space for matches. This approach is specifically tailored to stereo rigs in volumetric capture systems, where an accurate depth plays a key role in the downstream reconstruction task.
arXiv Detail & Related papers (2024-06-04T17:59:57Z)
VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations [25.88881764546414]
VQ-NeRF is an efficient pipeline for enhancing implicit neural representations via vector quantization. We present an innovative multi-scale NeRF sampling scheme that concurrently optimize the NeRF model at both compressed and original scales. We incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions.
arXiv Detail & Related papers (2023-10-23T01:41:38Z)
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration [59.6021678234829]
We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
arXiv Detail & Related papers (2023-07-27T13:52:42Z)
Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations [37.42624848693373]
We introduce a method that directly digests depth images for the task of high-fidelity 3D reconstruction. A simple sampling strategy is proposed to generate highly effective training data. Despite its simplicity, our method outperforms a range of both classical and learning-based baselines.
arXiv Detail & Related papers (2023-06-03T12:23:17Z)
Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution. By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z)
Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling [13.427887784558168]
We introduce a novel adaptive vision system for efficient action recognition processing. Our system pre-scans the global context sampling scheme at low-resolution and decides to skip or request high-resolution features at salient regions for further processing. We validate the system on EPIC-KENS and UCF-101 datasets for action recognition, and show that our proposed approach can greatly speed up inference with a tolerable loss of accuracy compared with those from state-the-art baselines.
arXiv Detail & Related papers (2022-07-12T01:18:58Z)
MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints. Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z)
Neural 3D Reconstruction in the Wild [86.6264706256377]
We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections. We present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes.
arXiv Detail & Related papers (2022-05-25T17:59:53Z)
RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation [110.4255414234771]
Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
arXiv Detail & Related papers (2022-05-11T17:59:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.