JADE: Joint-aware Latent Diffusion for 3D Human Generative Modeling
- URL: http://arxiv.org/abs/2412.20470v1
- Date: Sun, 29 Dec 2024 14:18:35 GMT
- Title: JADE: Joint-aware Latent Diffusion for 3D Human Generative Modeling
- Authors: Haorui Ji, Rong Wang, Taojun Lin, Hongdong Li,
- Abstract summary: We introduce JADE, a generative framework that learns the variations of human shapes with fined-grained control.
Our key insight is a joint-aware latent representation that decomposes human bodies into skeleton structures.
To generate coherent and plausible human shapes under our proposed decomposition, we also present a cascaded pipeline.
- Score: 62.77347895550087
- License:
- Abstract: Generative modeling of 3D human bodies have been studied extensively in computer vision. The core is to design a compact latent representation that is both expressive and semantically interpretable, yet existing approaches struggle to achieve both requirements. In this work, we introduce JADE, a generative framework that learns the variations of human shapes with fined-grained control. Our key insight is a joint-aware latent representation that decomposes human bodies into skeleton structures, modeled by joint positions, and local surface geometries, characterized by features attached to each joint. This disentangled latent space design enables geometric and semantic interpretation, facilitating users with flexible controllability. To generate coherent and plausible human shapes under our proposed decomposition, we also present a cascaded pipeline where two diffusions are employed to model the distribution of skeleton structures and local surface geometries respectively. Extensive experiments are conducted on public datasets, where we demonstrate the effectiveness of JADE framework in multiple tasks in terms of autoencoding reconstruction accuracy, editing controllability and generation quality compared with existing methods.
Related papers
- GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - StructLDM: Structured Latent Diffusion for 3D Human Generation [45.51684124904457]
StructLDM is a diffusion-based unconditional 3D human generative model learned from 2D images.
It enables different levels of controllable 3D human generation and editing, including pose/view/shape control, and high-level tasks including compositional generations, part-aware clothing editing, 3D virtual try-on, etc.
arXiv Detail & Related papers (2024-04-01T17:00:18Z) - An End-to-End Deep Learning Generative Framework for Refinable Shape
Matching and Generation [45.820901263103806]
Generative modelling for shapes is a prerequisite for In-Silico Clinical Trials (ISCTs)
We develop a novel unsupervised geometric deep-learning model to establish refinable shape correspondences in a latent space.
We extend our proposed base model to a joint shape generative-clustering multi-atlas framework to incorporate further variability.
arXiv Detail & Related papers (2024-03-10T21:33:53Z) - SPAMs: Structured Implicit Parametric Models [30.19414242608965]
We learn Structured-implicit PArametric Models (SPAMs) as a deformable object representation that structurally decomposes non-rigid object motion into part-based disentangled representations of shape and pose.
Experiments demonstrate that our part-aware shape and pose understanding lead to state-of-the-art performance in reconstruction and tracking of depth sequences of complex deforming object motion.
arXiv Detail & Related papers (2022-01-20T12:33:46Z) - 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch
Feature Swapping for Bodies and Faces [12.114711258010367]
We propose a self-supervised approach to train a 3D shape variational autoencoder which encourages a disentangled latent representation of identity features.
Experimental results conducted on 3D meshes show that state-of-the-art methods for latent disentanglement are not able to disentangle identity features of faces and bodies.
arXiv Detail & Related papers (2021-11-24T11:53:33Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - What and Where: Modeling Skeletons from Semantic and Spatial
Perspectives for Action Recognition [46.836815779215456]
We propose to model skeletons from a novel spatial perspective, from which the model takes the spatial location as prior knowledge to group human joints.
From the semantic perspective, we propose a Transformer-like network that is expert in modeling joint correlations.
From the spatial perspective, we transform the skeleton data into the sparse format for efficient feature extraction.
arXiv Detail & Related papers (2020-04-07T10:53:45Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.