RIN: Textured Human Model Recovery and Imitation with a Single Image
- URL: http://arxiv.org/abs/2011.12024v4
- Date: Sat, 14 Aug 2021 12:19:46 GMT
- Title: RIN: Textured Human Model Recovery and Imitation with a Single Image
- Authors: Haoxi Ran, Guangfu Wang, Li Lu
- Abstract summary: We propose a novel volume-based framework for reconstructing a textured 3D model from a single picture.
Specifically, to estimate most of the human texture, we propose a U-Net-like front-to-back translation network.
Our experiments demonstrate that our volume-based model is adequate for human imitation, and the back view can be estimated reliably using our network.
- Score: 4.87676530016726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human imitation has become topical recently, driven by GAN's ability to
disentangle human pose and body content. However, the latest methods hardly
focus on 3D information, and to avoid self-occlusion, a massive amount of input
images are needed. In this paper, we propose RIN, a novel volume-based
framework for reconstructing a textured 3D model from a single picture and
imitating a subject with the generated model. Specifically, to estimate most of
the human texture, we propose a U-Net-like front-to-back translation network.
With both front and back images input, the textured volume recovery module
allows us to color a volumetric human. A sequence of 3D poses then guides the
colored volume via Flowable Disentangle Networks as a volume-to-volume
translation task. To project volumes to a 2D plane during training, we design a
differentiable depth-aware renderer. Our experiments demonstrate that our
volume-based model is adequate for human imitation, and the back view can be
estimated reliably using our network. While prior works based on either 2D pose
or semantic map often fail for the unstable appearance of a human, our
framework can still produce concrete results, which are competitive to those
imagined from multi-view input.
Related papers
- COSMU: Complete 3D human shape from monocular unconstrained images [24.08612483445495]
We present a novel framework to reconstruct complete 3D human shapes from a given target image by leveraging monocular unconstrained images.
The objective of this work is to reproduce high-quality details in regions of the reconstructed human body that are not visible in the input target.
arXiv Detail & Related papers (2024-07-15T10:06:59Z) - Synthesizing Moving People with 3D Control [88.68284137105654]
We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
arXiv Detail & Related papers (2024-01-19T18:59:11Z) - SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion [35.73448283467723]
SiTH is a novel pipeline that integrates an image-conditioned diffusion model into a 3D mesh reconstruction workflow.
We employ a powerful generative diffusion model to hallucinate unseen back-view appearance based on the input images.
For the latter, we leverage skinned body meshes as guidance to recover full-body texture meshes from the input and back-view images.
arXiv Detail & Related papers (2023-11-27T14:22:07Z) - Refining 3D Human Texture Estimation from a Single Image [3.8761064607384195]
Estimating 3D human texture from a single image is essential in graphics and vision.
We propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network.
arXiv Detail & Related papers (2023-03-06T19:53:50Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Detailed Avatar Recovery from Single Image [50.82102098057822]
This paper presents a novel framework to recover emphdetailed avatar from a single image.
We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation framework.
Our method can restore detailed human body shapes with complete textures beyond skinned models.
arXiv Detail & Related papers (2021-08-06T03:51:26Z) - Neural Re-Rendering of Humans from a Single Image [80.53438609047896]
We propose a new method for neural re-rendering of a human under a novel user-defined pose and viewpoint.
Our algorithm represents body pose and shape as a parametric mesh which can be reconstructed from a single image.
arXiv Detail & Related papers (2021-01-11T18:53:47Z) - SMPLpix: Neural Avatars from 3D Human Models [56.85115800735619]
We bridge the gap between classic rendering and the latest generative networks operating in pixel space.
We train a network that directly converts a sparse set of 3D mesh vertices into photorealistic images.
We show the advantage over conventional differentiables both in terms of the level of photorealism and rendering efficiency.
arXiv Detail & Related papers (2020-08-16T10:22:00Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z) - Reposing Humans by Warping 3D Features [18.688568898013482]
We propose to implicitly learn a dense feature volume from human images.
The volume is mapped back to RGB space by a convolutional decoder.
Our state-of-the-art results on the DeepFashion and the iPER benchmarks indicate that dense volumetric human representations are worth investigating.
arXiv Detail & Related papers (2020-06-08T19:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.