LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
- URL: http://arxiv.org/abs/2503.10625v1
- Date: Thu, 13 Mar 2025 17:59:21 GMT
- Title: LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
- Authors: Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, Liefeng Bo,
- Abstract summary: We propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass.<n>Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism.<n>Our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.
- Score: 21.99354901986186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation. Recent advances in 3D human reconstruction mainly focus on static human modeling, and the reliance of using synthetic 3D scans for training limits their generalization ability. Conversely, optimization-based video methods achieve higher fidelity but demand controlled capture conditions and computationally intensive refinement processes. Motivated by the emergence of large reconstruction models for efficient static reconstruction, we propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism, enabling detailed preservation of clothing geometry and texture. To further boost the face identity preservation and fine detail recovery, we propose a head feature pyramid encoding scheme to aggregate multi-scale features of the head regions. Extensive experiments demonstrate that our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.
Related papers
- SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [72.26350984924129]
We propose a latent space generation paradigm for 3D human digitization.
We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift.
We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
arXiv Detail & Related papers (2025-04-09T15:38:18Z) - HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [29.03216532351979]
We introduce textbfHumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline.
In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority.
We also propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view.
arXiv Detail & Related papers (2025-04-04T15:35:14Z) - MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior [35.704591162502375]
We present emphMVD-HuGaS, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model.<n>Experiments on Thuman2.0 and 2K2K datasets show that the proposed MVD-HuGaS achieves state-of-the-art performance on single-view 3D human rendering.
arXiv Detail & Related papers (2025-03-11T09:37:15Z) - MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge.
We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z) - Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM [29.13412037370585]
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
arXiv Detail & Related papers (2024-01-22T18:08:22Z) - GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting [14.937297984020821]
We propose a novel clothed human reconstruction method called GaussianBody, based on 3D Gaussian Splatting.
Applying the static 3D Gaussian Splatting model to the dynamic human reconstruction problem is non-trivial due to complicated non-rigid deformations and rich cloth details.
We show that our method can achieve state-of-the-art photorealistic novel-view rendering results with high-quality details for dynamic clothed human bodies.
arXiv Detail & Related papers (2024-01-18T04:48:13Z) - H4D: Human 4D Modeling by Learning Neural Compositional Representation [75.34798886466311]
This work presents a novel framework that can effectively learn a compact and compositional representation for dynamic human.
A simple yet effective linear motion model is proposed to provide a rough and regularized motion estimation.
Experiments demonstrate our method is not only efficacy in recovering dynamic human with accurate motion and detailed geometry, but also amenable to various 4D human related tasks.
arXiv Detail & Related papers (2022-03-02T17:10:49Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - Fast-GANFIT: Generative Adversarial Network for High Fidelity 3D Face
Reconstruction [76.1612334630256]
We harness the power of Generative Adversarial Networks (GANs) and Deep Convolutional Neural Networks (DCNNs) to reconstruct the facial texture and shape from single images.
We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, facial texture reconstruction with high-frequency details.
arXiv Detail & Related papers (2021-05-16T16:35:44Z) - Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.