Disentangled Clothed Avatar Generation with Layered Representation
- URL: http://arxiv.org/abs/2501.04631v1
- Date: Wed, 08 Jan 2025 17:27:27 GMT
- Title: Disentangled Clothed Avatar Generation with Layered Representation
- Authors: Weitian Zhang, Sijing Wu, Manwen Liao, Yichao Yan,
- Abstract summary: Clothed avatar generation has wide applications in virtual and augmented reality, filmmaking, and more.
Previous methods have achieved success in generating diverse digital avatars, however, generating avatars with disentangled components has long been a challenge.
We propose LayerAvatar, the first feed-forward diffusion-based method for generating component-disentangled clothed avatars.
- Score: 5.775559930050691
- License:
- Abstract: Clothed avatar generation has wide applications in virtual and augmented reality, filmmaking, and more. Previous methods have achieved success in generating diverse digital avatars, however, generating avatars with disentangled components (\eg, body, hair, and clothes) has long been a challenge. In this paper, we propose LayerAvatar, the first feed-forward diffusion-based method for generating component-disentangled clothed avatars. To achieve this, we first propose a layered UV feature plane representation, where components are distributed in different layers of the Gaussian-based UV feature plane with corresponding semantic labels. This representation supports high-resolution and real-time rendering, as well as expressive animation including controllable gestures and facial expressions. Based on the well-designed representation, we train a single-stage diffusion model and introduce constrain terms to address the severe occlusion problem of the innermost human body layer. Extensive experiments demonstrate the impressive performances of our method in generating disentangled clothed avatars, and we further explore its applications in component transfer. The project page is available at: https://olivia23333.github.io/LayerAvatar/
Related papers
- Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance [69.9745497000557]
We introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input.
Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation.
arXiv Detail & Related papers (2025-01-09T17:04:33Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - DivAvatar: Diverse 3D Avatar Generation with a Single Prompt [95.9978722953278]
DivAvatar is a framework that generates diverse avatars from a single text prompt.
It has two key designs that help achieve generation diversity and visual quality.
Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances.
arXiv Detail & Related papers (2024-02-27T08:10:31Z) - AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z) - HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images [33.298962236215964]
We study the reconstruction of human avatars from a few-shot unconstrained photo album.
For handling dynamic data, we integrate a skinning mechanism with deep marching tetrahedra.
Our framework, called HaveFun, can undertake avatar reconstruction, rendering, and animation.
arXiv Detail & Related papers (2023-11-27T10:01:31Z) - MagicAvatar: Multimodal Avatar Generation and Animation [70.55750617502696]
MagicAvatar is a framework for multimodal video generation and animation of human avatars.
It disentangles avatar video generation into two stages: multimodal-to-motion and motion-to-video generation.
We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation.
arXiv Detail & Related papers (2023-08-28T17:56:18Z) - AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars
Using 2D Diffusion [34.609403685504944]
We present AvatarFusion, a framework for zero-shot text-to-avatar generation.
We use a latent diffusion model to provide pixel-level guidance for generating human-realistic avatars.
We also introduce a novel optimization method, called Pixel-Semantics Difference-Sampling (PS-DS), which semantically separates the generation of body and clothes.
arXiv Detail & Related papers (2023-07-13T02:19:56Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - PointAvatar: Deformable Point-based Head Avatars from Videos [103.43941945044294]
PointAvatar is a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading.
We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources.
arXiv Detail & Related papers (2022-12-16T10:05:31Z) - Explicit Clothing Modeling for an Animatable Full-Body Avatar [21.451440299450592]
We build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos.
To learn the interaction between the body dynamics and clothing states, we use a temporal convolution network to predict the clothing latent code.
We show photorealistic animation output for three different actors, and demonstrate the advantage of our clothed-body avatars over single-layer avatars.
arXiv Detail & Related papers (2021-06-28T17:58:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.