HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular
Video
- URL: http://arxiv.org/abs/2201.04127v1
- Date: Tue, 11 Jan 2022 18:51:21 GMT
- Title: HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular
Video
- Authors: Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron
and Ira Kemelmacher-Shlizerman
- Abstract summary: We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions.
Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints.
- Score: 44.58519508310171
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on
a given monocular video of a human performing complex body motions, e.g. a
video from YouTube. Our method enables pausing the video at any frame and
rendering the subject from arbitrary new camera viewpoints or even a full
360-degree camera path for that particular frame and body pose. This task is
particularly challenging, as it requires synthesizing photorealistic details of
the body, as seen from various camera angles that may not exist in the input
video, as well as synthesizing fine details such as cloth folds and facial
appearance. Our method optimizes for a volumetric representation of the person
in a canonical T-pose, in concert with a motion field that maps the estimated
canonical representation to every frame of the video via backward warps. The
motion field is decomposed into skeletal rigid and non-rigid motions, produced
by deep networks. We show significant performance improvements over prior work,
and compelling examples of free-viewpoint renderings from monocular video of
moving humans in challenging uncontrolled capture scenarios.
Related papers
- Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline.
Our model does not require depth as input, and does not explicitly model 3D scene geometry.
We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis [76.72505510632904]
We present Total-Recon, the first method to reconstruct deformable scenes from long monocular RGBD videos.
Our method hierarchically decomposes the scene into the background and objects, whose motion is decomposed into root-body motion and local articulations.
arXiv Detail & Related papers (2023-04-24T17:59:52Z) - Decoupling Human and Camera Motion from Videos in the Wild [67.39432972193929]
We propose a method to reconstruct global human trajectories from videos in the wild.
Our method decouples the camera and human motion, which allows us to place people in the same world coordinate frame.
arXiv Detail & Related papers (2023-02-24T18:59:15Z) - Image Animation with Keypoint Mask [0.0]
Motion transfer is the task of synthesizing future video frames of a single source image according to the motion from a given driving video.
In this work, we extract the structure from a keypoint heatmap, without an explicit motion representation.
Then, the structures from the image and the video are extracted to warp the image according to the video, by a deep generator.
arXiv Detail & Related papers (2021-12-20T11:35:06Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - Motion Capture from Internet Videos [47.943209721329474]
Recent advances in image-based human pose estimation make it possible to capture 3D human motion from a single RGB video.
While multi-view videos are not common, the videos of a celebrity performing a specific action are usually abundant on the Internet.
We propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos.
arXiv Detail & Related papers (2020-08-18T13:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.