HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos
- URL: http://arxiv.org/abs/2512.06368v2
- Date: Tue, 09 Dec 2025 08:25:59 GMT
- Title: HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos
- Authors: Weitao Xiong, Zhiyuan Yuan, Jiahao Lu, Chengfeng Zhao, Peng Li, Yuan Liu,
- Abstract summary: We propose to incorporate hybrid geometric priors that combine SMPL human body models with monocular depth estimation.<n>HuPrior3R, featuring a hierarchical pipeline with refinement components, then applies strategic cropping and cross-attention fusion for human-specific detail enhancement.<n>Experiments on TUM Dynamics and GTA-IM datasets demonstrate superior performance in dynamic human reconstruction.
- Score: 20.256869569776118
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular dynamic video reconstruction faces significant challenges in dynamic human scenes due to geometric inconsistencies and resolution degradation issues. Existing methods lack 3D human structural understanding, producing geometrically inconsistent results with distorted limb proportions and unnatural human-object fusion, while memory-constrained downsampling causes human boundary drift toward background geometry. To address these limitations, we propose to incorporate hybrid geometric priors that combine SMPL human body models with monocular depth estimation. Our approach leverages structured human priors to maintain surface consistency while capturing fine-grained geometric details in human regions. We introduce HuPrior3R, featuring a hierarchical pipeline with refinement components that processes full-resolution images for overall scene geometry, then applies strategic cropping and cross-attention fusion for human-specific detail enhancement. The method integrates SMPL priors through a Feature Fusion Module to ensure geometrically plausible reconstruction while preserving fine-grained human boundaries. Extensive experiments on TUM Dynamics and GTA-IM datasets demonstrate superior performance in dynamic human reconstruction.
Related papers
- Dynamic Avatar-Scene Rendering from Human-centric Context [75.95641456716373]
We propose bf Separate-then-Map (StM) strategy to bridge separately defined and optimized models.<n>StM significantly outperforms existing state-of-the-art methods in both visual quality and rendering accuracy.
arXiv Detail & Related papers (2025-11-13T17:39:06Z) - HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation [51.27178551863772]
We propose a unified framework that enables the joint modeling of appearance and human-part semantics from a single image.<n>HumanCrafter surpasses existing state-of-the-art methods in both 3D human-part segmentation and 3D human reconstruction from a single image.
arXiv Detail & Related papers (2025-11-01T09:29:36Z) - HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics [60.737929335600015]
We present textbfHumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents.<n>HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization.
arXiv Detail & Related papers (2025-08-13T14:50:19Z) - HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers [60.86393841247567]
HumanRAM is a novel feed-forward approach for generalizable human reconstruction and animation from monocular or sparse human images.<n>Our approach integrates human reconstruction and animation into a unified framework by introducing explicit pose conditions.<n> Experiments show that HumanRAM significantly surpasses previous methods in terms of reconstruction accuracy, animation fidelity, and generalization performance on real-world datasets.
arXiv Detail & Related papers (2025-06-03T17:50:05Z) - GRACE: Estimating Geometry-level 3D Human-Scene Contact from 2D Images [54.602947113980655]
Estimating the geometry level of human-scene contact aims to ground specific contact surface points at 3D human geometries.<n> GRACE (Geometry-level Reasoning for 3D Human-scene Contact Estimation) is a new paradigm for 3D human contact estimation.<n>It incorporates a point cloud encoder-decoder architecture along with a hierarchical feature extraction and fusion module.
arXiv Detail & Related papers (2025-05-10T09:25:46Z) - HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [29.03216532351979]
We introduce textbfHumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline.<n>In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority.<n>We also propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view.
arXiv Detail & Related papers (2025-04-04T15:35:14Z) - GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data [61.05815629606135]
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model.<n>GeneMAN builds upon a comprehensive collection of high-quality human data.<n>GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T18:59:54Z) - PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing [47.191113407993015]
PSHuman is a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model.<n>It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions.<n>To enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X.
arXiv Detail & Related papers (2024-09-16T10:13:06Z) - HumanRecon: Neural Reconstruction of Dynamic Human Using Geometric Cues
and Physical Priors [31.15329654138382]
We consider the geometric constraints of estimated depth and normals in the learning of neural implicit representation for dynamic human reconstruction.
We also exploit several beneficial physical priors, such as adding noise into view direction and maximizing the density on the human surface.
arXiv Detail & Related papers (2023-11-26T03:06:59Z) - CrossHuman: Learning Cross-Guidance from Multi-Frame Images for Human
Reconstruction [6.450579406495884]
CrossHuman is a novel method that learns cross-guidance from parametric human model and multi-frame RGB images.
We design a reconstruction pipeline combined with tracking-based methods and tracking-free methods.
Compared with previous works, our CrossHuman enables high-fidelity geometry details and texture in both visible and invisible regions.
arXiv Detail & Related papers (2022-07-20T08:25:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.