DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras
- URL: http://arxiv.org/abs/2105.00261v1
- Date: Sat, 1 May 2021 14:32:13 GMT
- Title: DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras
- Authors: Yang Zheng, Ruizhi Shao, Yuxiang Zhang, Tao Yu, Zerong Zheng, Qionghai
Dai, Yebin Liu
- Abstract summary: DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
- Score: 63.186486240525554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose DeepMultiCap, a novel method for multi-person performance capture
using sparse multi-view cameras. Our method can capture time varying surface
details without the need of using pre-scanned template models. To tackle with
the serious occlusion challenge for close interacting scenes, we combine a
recently proposed pixel-aligned implicit function with parametric model for
robust reconstruction of the invisible surface areas. An effective
attention-aware module is designed to obtain the fine-grained geometry details
from multi-view images, where high-fidelity results can be generated. In
addition to the spatial attention method, for video inputs, we further propose
a novel temporal fusion method to alleviate the noise and temporal
inconsistencies for moving character reconstruction. For quantitative
evaluation, we contribute a high quality multi-person dataset, MultiHuman,
which consists of 150 static scenes with different levels of occlusions and
ground truth 3D human models. Experimental results demonstrate the
state-of-the-art performance of our method and the well generalization to real
multiview video data, which outperforms the prior works by a large margin.
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - GenS: Generalizable Neural Surface Reconstruction from Multi-View Images [20.184657468900852]
GenS is an end-to-end generalizable neural surface reconstruction model.
Our representation is more powerful, which can recover high-frequency details while maintaining global smoothness.
Experiments on popular benchmarks show that our model can generalize well to new scenes.
arXiv Detail & Related papers (2024-06-04T17:13:10Z) - Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention [87.02613021058484]
We introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image.
Era3D generates high-quality multiview images with up to a 512*512 resolution while reducing complexity by 12x times.
arXiv Detail & Related papers (2024-05-19T17:13:16Z) - Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints.
In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields.
We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z) - Neural Pixel Composition: 3D-4D View Synthesis from Multi-Views [12.386462516398469]
We present a novel approach for continuous 3D-4D view synthesis given only a discrete set of multi-view observations as input.
The proposed formulation reliably operates on sparse and wide-baseline multi-view imagery.
It can be trained efficiently within a few seconds to 10 minutes for hi-res (12MP) content.
arXiv Detail & Related papers (2022-07-21T17:58:02Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos.
We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.