Related papers: HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

URL: http://arxiv.org/abs/2406.12459v2
Date: Wed, 30 Oct 2024 12:50:27 GMT
Title: HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors
Authors: Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu,
Abstract summary: HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image. HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
Score: 47.62426718293504
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.

Related papers

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers [60.86393841247567]
HumanRAM is a novel feed-forward approach for generalizable human reconstruction and animation from monocular or sparse human images.<n>Our approach integrates human reconstruction and animation into a unified framework by introducing explicit pose conditions.<n> Experiments show that HumanRAM significantly surpasses previous methods in terms of reconstruction accuracy, animation fidelity, and generalization performance on real-world datasets.
arXiv Detail & Related papers (2025-06-03T17:50:05Z)
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [72.26350984924129]
We propose a latent space generation paradigm for 3D human digitization. We transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift. We employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset.
arXiv Detail & Related papers (2025-04-09T15:38:18Z)
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [29.03216532351979]
We introduce textbfHumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline. In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority. We also propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view.
arXiv Detail & Related papers (2025-04-04T15:35:14Z)
HumanGif: Single-View Human Diffusion with Generative Prior [25.516544735593087]
We propose HumanGif, a single-view human diffusion model with generative priors. Specifically, we formulate the single-view-based 3D human novel view and pose synthesis as a single-view-conditioned human diffusion process. We show that HumanGif achieves the best perceptual performance, with better generalizability for novel view and pose synthesis.
arXiv Detail & Related papers (2025-02-17T17:55:27Z)
GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We introduce a generalizable and unified framework to synthesize view-consistent and temporally coherent avatars from a single image. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z)
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data [61.05815629606135]
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. GeneMAN builds upon a comprehensive collection of high-quality human data. GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T18:59:54Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images. Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features [23.321087432786605]
We present a novel approach called HFGaussian that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS. We thoroughly evaluate our HFGaussian method against the latest state-of-the-art techniques in human Gaussian splatting and pose estimation, demonstrating its real-time, state-of-the-art performance.
arXiv Detail & Related papers (2024-11-05T13:31:04Z)
PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion [43.850899288337025]
PSHuman is a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions. To enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X.
arXiv Detail & Related papers (2024-09-16T10:13:06Z)
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images. Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls. This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z)
Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM [29.13412037370585]
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image. Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
arXiv Detail & Related papers (2024-01-22T18:08:22Z)
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network. Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z)
Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR) HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically. An integrated top-down model is designed to leverage these ordinal relations in the learning process. The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.