Related papers: Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

URL: http://arxiv.org/abs/2409.17280v1
Date: Wed, 25 Sep 2024 18:46:06 GMT
Title: Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Authors: Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu,
Abstract summary: textbfD4D is a novel framework for 4D human generation and animation from a single image. It disentangles clothings from the human body (with SMPL-X model) It supports 4D human animation with vivid dynamics.
Score: 49.188657545633475
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present \textbf{Disco4D}, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. \textbf{1)} Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. \textbf{2)} It adopts diffusion models to enhance the 3D generation process, \textit{e.g.}, modeling occluded parts not visible in the input image. \textbf{3)} It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in \url{https://disco-4d.github.io/}.

Related papers

4-LEGS: 4D Language Embedded Gaussian Splatting [12.699978393733309]
We show how to lift-temporal features to a 4D representation based on 3D Gaussianting. This enables an interactive interface where the user cantemporally localize events in the video from text prompts. We demonstrate our system on public 3D video datasets of people and animals performing various actions.
arXiv Detail & Related papers (2024-10-14T17:00:53Z)
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers [23.96688843662126]
We base our work on 3D Gaussian Splatting (3DGS), a scene representation composed of a mixture of Gaussians. We show that this combination can achieve fast inference of 3D human models from a single image without test-time optimization. We also show that it can improve 3D pose estimation by better fitting human models that account for clothes and other variations.
arXiv Detail & Related papers (2024-09-06T11:34:24Z)
iHuman: Instant Animatable Digital Humans From Monocular Videos [16.98924995658091]
We present a fast, simple, yet effective method for creating animatable 3D digital humans from monocular videos. This work achieves and illustrates the need of accurate 3D mesh-type modelling of the human body. Our method is faster by an order of magnitude (in terms of training time) than its closest competitor.
arXiv Detail & Related papers (2024-07-15T18:51:51Z)
Segment Any 4D Gaussians [69.53172192552508]
We propose Segment Any 4D Gaussians (SA4D) to segment anything in the 4D digital world based on 4D Gaussians. SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks.
arXiv Detail & Related papers (2024-07-05T13:44:15Z)
AnimateMe: 4D Facial Expressions via Diffusion Models [72.63383191654357]
Recent advances in diffusion models have enhanced the capabilities of generative models in 2D animation. We employ Graph Neural Networks (GNNs) as denoising diffusion models in a novel approach, formulating the diffusion process directly on the mesh space. This facilitates the generation of facial deformations through a mesh-diffusion-based model.
arXiv Detail & Related papers (2024-03-25T21:40:44Z)
UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling [71.87807614875497]
We propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose.
arXiv Detail & Related papers (2024-03-18T09:03:56Z)
DreamGaussian4D: Generative 4D Gaussian Splatting [56.49043443452339]
We introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS) Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation.
arXiv Detail & Related papers (2023-12-28T17:16:44Z)
Deformable 3D Gaussian Splatting for Animatable Human Avatars [50.61374254699761]
We propose a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as Splat masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware.
arXiv Detail & Related papers (2023-12-22T20:56:46Z)
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects. We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z)
Animatable 3D Gaussians for High-fidelity Synthesis of Human Motions [37.50707388577952]
We present a novel animatable 3D Gaussian model for rendering high-fidelity free-view human motions in real time. Compared to existing NeRF-based methods, the model owns better capability in high-frequency details without the jittering problem across video frames.
arXiv Detail & Related papers (2023-11-22T14:00:23Z)
3D-Aware Semantic-Guided Generative Model for Human Synthesis [67.86621343494998]
This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis. Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines.
arXiv Detail & Related papers (2021-12-02T17:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.