ICON: Implicit Clothed humans Obtained from Normals
- URL: http://arxiv.org/abs/2112.09127v1
- Date: Thu, 16 Dec 2021 18:59:41 GMT
- Title: ICON: Implicit Clothed humans Obtained from Normals
- Authors: Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas and Michael J. Black
- Abstract summary: Implicit functions are well suited to the first task, as they can capture details like hair or clothes.
ICON infers detailed clothed-human normals conditioned on the SMPL(-X) normals.
ICON takes a step towards robust 3D clothed human reconstruction from in-the-wild images.
- Score: 49.5397825300977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current methods for learning realistic and animatable 3D clothed avatars need
either posed 3D scans or 2D images with carefully controlled user poses. In
contrast, our goal is to learn the avatar from only 2D images of people in
unconstrained poses. Given a set of images, our method estimates a detailed 3D
surface from each image and then combines these into an animatable avatar.
Implicit functions are well suited to the first task, as they can capture
details like hair or clothes. Current methods, however, are not robust to
varied human poses and often produce 3D surfaces with broken or disembodied
limbs, missing details, or non-human shapes. The problem is that these methods
use global feature encoders that are sensitive to global pose. To address this,
we propose ICON ("Implicit Clothed humans Obtained from Normals"), which uses
local features, instead. ICON has two main modules, both of which exploit the
SMPL(-X) body model. First, ICON infers detailed clothed-human normals
(front/back) conditioned on the SMPL(-X) normals. Second, a visibility-aware
implicit surface regressor produces an iso-surface of a human occupancy field.
Importantly, at inference time, a feedback loop alternates between refining the
SMPL(-X) mesh using the inferred clothed normals and then refining the normals.
Given multiple reconstructed frames of a subject in varied poses, we use
SCANimate to produce an animatable avatar from them. Evaluation on the AGORA
and CAPE datasets shows that ICON outperforms the state of the art in
reconstruction, even with heavily limited training data. Additionally, it is
much more robust to out-of-distribution samples, e.g., in-the-wild poses/images
and out-of-frame cropping. ICON takes a step towards robust 3D clothed human
reconstruction from in-the-wild images. This enables creating avatars directly
from video with personalized and natural pose-dependent cloth deformation.
Related papers
- AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars
Using 2D Diffusion [34.609403685504944]
We present AvatarFusion, a framework for zero-shot text-to-avatar generation.
We use a latent diffusion model to provide pixel-level guidance for generating human-realistic avatars.
We also introduce a novel optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which semantically separates the generation of body and clothes.
arXiv Detail & Related papers (2023-07-13T02:19:56Z) - MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling [59.74064212110042]
mpmcan handle multiple tasks including 3D human pose estimation, 3D pose estimation from cluded 2D pose, and 3D pose completion in a textocbfsingle framework.
We conduct extensive experiments and ablation studies on several widely used human pose datasets and achieve state-of-the-art performance on MPI-INF-3DHP.
arXiv Detail & Related papers (2023-06-29T10:30:00Z) - ECON: Explicit Clothed humans Optimized via Normal integration [54.51948104460489]
We present ECON, a method for creating 3D humans in loose clothes.
It infers detailed 2D maps for the front and back side of a clothed person.
It "inpaints" the missing geometry between d-BiNI surfaces.
arXiv Detail & Related papers (2022-12-14T18:59:19Z) - Capturing and Animation of Body and Clothing from Monocular Video [105.87228128022804]
We present SCARF, a hybrid model combining a mesh-based body with a neural radiance field.
integrating the mesh into the rendering enables us to optimize SCARF directly from monocular videos.
We demonstrate that SCARFs clothing with higher visual quality than existing methods, that the clothing deforms with changing body pose and body shape, and that clothing can be successfully transferred between avatars of different subjects.
arXiv Detail & Related papers (2022-10-04T19:34:05Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z) - The Power of Points for Modeling Humans in Clothing [60.00557674969284]
Currently it requires an artist to create 3D human avatars with realistic clothing that can move naturally.
We show that a 3D representation can capture varied topology at high resolution and that can be learned from data.
We train a neural network with a novel local clothing geometric feature to represent the shape of different outfits.
arXiv Detail & Related papers (2021-09-02T17:58:45Z) - AGORA: Avatars in Geography Optimized for Regression Analysis [35.22486186509372]
AGORA is a synthetic dataset with high realism and highly accurate ground truth.
We create reference 3D poses and body shapes by fitting the SMPL-X body model (with face and hands) to the 3D scans.
We evaluate existing state-of-the-art methods for 3D human pose estimation on this dataset and find that most methods perform poorly on images of children.
arXiv Detail & Related papers (2021-04-29T20:33:25Z) - ARCH: Animatable Reconstruction of Clothed Humans [27.849315613277724]
ARCH (Animatable Reconstruction of Clothed Humans) is an end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image.
ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image.
arXiv Detail & Related papers (2020-04-08T14:23:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.