HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
- URL: http://arxiv.org/abs/2406.12459v2
- Date: Wed, 30 Oct 2024 12:50:27 GMT
- Title: HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
- Authors: Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu,
- Abstract summary: HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image.
HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
- Score: 47.62426718293504
- License:
- Abstract: Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
Related papers
- HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features [23.321087432786605]
We present a novel approach called HFGaussian that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS.
We thoroughly evaluate our HFGaussian method against the latest state-of-the-art techniques in human Gaussian splatting and pose estimation, demonstrating its real-time, state-of-the-art performance.
arXiv Detail & Related papers (2024-11-05T13:31:04Z) - PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion [43.850899288337025]
PSHuman is a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model.
It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions.
To enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X.
arXiv Detail & Related papers (2024-09-16T10:13:06Z) - Towards Effective Usage of Human-Centric Priors in Diffusion Models for
Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images.
Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls.
This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z) - Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM [29.13412037370585]
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
arXiv Detail & Related papers (2024-01-22T18:08:22Z) - InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation [61.62346472443454]
InceptionHuman is a prompt-to-NeRF framework that allows easy control via a combination of prompts in different modalities to generate photorealistic 3D humans.
InceptionHuman achieves consistent 3D human generation within a progressively refined NeRF space.
arXiv Detail & Related papers (2023-11-27T15:49:41Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.