Synthesizing Moving People with 3D Control
- URL: http://arxiv.org/abs/2401.10889v1
- Date: Fri, 19 Jan 2024 18:59:11 GMT
- Title: Synthesizing Moving People with 3D Control
- Authors: Boyi Li, Jathushan Rajasegaran, Yossi Gandelsman, Alexei A. Efros,
Jitendra Malik
- Abstract summary: We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
- Score: 88.68284137105654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a diffusion model-based framework for animating
people from a single image for a given target 3D motion sequence. Our approach
has two core components: a) learning priors about invisible parts of the human
body and clothing, and b) rendering novel body poses with proper clothing and
texture. For the first part, we learn an in-filling diffusion model to
hallucinate unseen parts of a person given a single image. We train this model
on texture map space, which makes it more sample-efficient since it is
invariant to pose and viewpoint. Second, we develop a diffusion-based rendering
pipeline, which is controlled by 3D human poses. This produces realistic
renderings of novel poses of the person, including clothing, hair, and
plausible in-filling of unseen regions. This disentangled approach allows our
method to generate a sequence of images that are faithful to the target motion
in the 3D pose and, to the input image in terms of visual similarity. In
addition to that, the 3D control allows various synthetic camera trajectories
to render a person. Our experiments show that our method is resilient in
generating prolonged motions and varied challenging and complex poses compared
to prior methods. Please check our website for more details:
https://boyiliee.github.io/3DHM.github.io/.
Related papers
- Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single
Camera [8.308263758475938]
We introduce a method for high-quality modeling of clothed 3D human avatars using a video of a person with dynamic movements.
For explicit modeling, a neural network learns to generate point-wise shape residuals and appearance features of a 3D body model.
For implicit modeling, an implicit network combines the appearance and 3D motion features to decode high-fidelity clothed 3D human avatars.
arXiv Detail & Related papers (2023-12-28T06:04:39Z) - Single-Image 3D Human Digitization with Shape-Guided Diffusion [31.99621159464388]
NeRF and its variants typically require videos or images from different viewpoints.
We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image.
arXiv Detail & Related papers (2023-11-15T18:59:56Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - 3D Cinemagraphy from a Single Image [73.09720823592092]
We present 3D Cinemagraphy, a new technique that marries 2D image animation with 3D photography.
Given a single still image as input, our goal is to generate a video that contains both visual content animation and camera motion.
arXiv Detail & Related papers (2023-03-10T06:08:23Z) - Capturing and Animation of Body and Clothing from Monocular Video [105.87228128022804]
We present SCARF, a hybrid model combining a mesh-based body with a neural radiance field.
integrating the mesh into the rendering enables us to optimize SCARF directly from monocular videos.
We demonstrate that SCARFs clothing with higher visual quality than existing methods, that the clothing deforms with changing body pose and body shape, and that clothing can be successfully transferred between avatars of different subjects.
arXiv Detail & Related papers (2022-10-04T19:34:05Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Creating and Reenacting Controllable 3D Humans with Differentiable
Rendering [3.079885946230076]
This paper proposes a new end-to-end neural rendering architecture to transfer appearance and reenact human actors.
Our method leverages a carefully designed graph convolutional network (GCN) to model the human body manifold structure.
By taking advantages of both different synthesisiable rendering and the 3D parametric model, our method is fully controllable.
arXiv Detail & Related papers (2021-10-22T12:40:09Z) - Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D
Shape, Pose, and Appearance Consistency [55.94908688207493]
We propose a self-supervised framework named SPICE that closes the image quality gap with supervised methods.
The key insight enabling self-supervision is to exploit 3D information about the human body in several ways.
SPICE achieves state-of-the-art performance on the DeepFashion dataset.
arXiv Detail & Related papers (2021-10-11T17:48:50Z) - Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the
Wild [22.881898195409885]
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video.
The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction.
arXiv Detail & Related papers (2020-12-23T18:50:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.