GALA: Generating Animatable Layered Assets from a Single Scan
- URL: http://arxiv.org/abs/2401.12979v1
- Date: Tue, 23 Jan 2024 18:59:59 GMT
- Title: GALA: Generating Animatable Layered Assets from a Single Scan
- Authors: Taeksoo Kim, Byungjun Kim, Shunsuke Saito, Hanbyul Joo
- Abstract summary: We present GALA, a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets.
The outputs can then be combined with other assets to create novel clothed human avatars with any pose.
- Score: 20.310367593475508
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present GALA, a framework that takes as input a single-layer clothed 3D
human mesh and decomposes it into complete multi-layered 3D assets. The outputs
can then be combined with other assets to create novel clothed human avatars
with any pose. Existing reconstruction approaches often treat clothed humans as
a single-layer of geometry and overlook the inherent compositionality of humans
with hairstyles, clothing, and accessories, thereby limiting the utility of the
meshes for downstream applications. Decomposing a single-layer mesh into
separate layers is a challenging task because it requires the synthesis of
plausible geometry and texture for the severely occluded regions. Moreover,
even with successful decomposition, meshes are not normalized in terms of poses
and body shapes, failing coherent composition with novel identities and poses.
To address these challenges, we propose to leverage the general knowledge of a
pretrained 2D diffusion model as geometry and appearance prior for humans and
other assets. We first separate the input mesh using the 3D surface
segmentation extracted from multi-view 2D segmentations. Then we synthesize the
missing geometry of different layers in both posed and canonical spaces using a
novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete
inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its
texture to obtain the complete appearance including the initially occluded
regions. Through a series of decomposition steps, we obtain multiple layers of
3D assets in a shared canonical space normalized in terms of poses and human
shapes, hence supporting effortless composition to novel identities and
reanimation with novel poses. Our experiments demonstrate the effectiveness of
our approach for decomposition, canonicalization, and composition tasks
compared to existing solutions.
Related papers
- ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction.
Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition.
We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z) - 3D Reconstruction of Interacting Multi-Person in Clothing from a Single Image [8.900009931200955]
This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image.
We overcome this challenge by utilizing two human priors for complete 3D geometry and surface contacts.
The results demonstrate that our method is complete, globally coherent, and physically plausible compared to existing methods.
arXiv Detail & Related papers (2024-01-12T07:23:02Z) - Efficient 3D Articulated Human Generation with Layered Surface Volumes [131.3802971483426]
We introduce layered surface volumes (LSVs) as a new 3D object representation for articulated digital humans.
LSVs represent a human body using multiple textured layers around a conventional template.
They exhibit exceptional efficiency in GAN settings, where a 2D generator learns to synthesize the RGBA textures for the individual layers.
arXiv Detail & Related papers (2023-07-11T17:50:02Z) - MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling [59.74064212110042]
mpmcan handle multiple tasks including 3D human pose estimation, 3D pose estimation from cluded 2D pose, and 3D pose completion in a textocbfsingle framework.
We conduct extensive experiments and ablation studies on several widely used human pose datasets and achieve state-of-the-art performance on MPI-INF-3DHP.
arXiv Detail & Related papers (2023-06-29T10:30:00Z) - USR: Unsupervised Separated 3D Garment and Human Reconstruction via
Geometry and Semantic Consistency [41.89803177312638]
We propose an unsupervised separated 3D garments and human reconstruction model (USR), which reconstructs the human body and authentic textured clothes in layers without 3D models.
Our method proposes a generalized surface-aware neural radiance field to learn the mapping between sparse multi-view images and geometries of the dressed people.
arXiv Detail & Related papers (2023-02-21T08:48:27Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from
a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives.
Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives.
Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.