AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors
- URL: http://arxiv.org/abs/2602.23951v1
- Date: Fri, 27 Feb 2026 11:53:45 GMT
- Title: AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors
- Authors: Xiaozhen Qiao, Wenjia Wang, Zhiyuan Zhao, Jiacheng Sun, Ping Luo, Hongyuan Zhang, Xuelong Li,
- Abstract summary: We present textbfAHAP, a feed-forward framework for reconstructing arbitrary humans from arbitrary camera perspectives.<n>Our core lies in the effective fusion of multi-view geometry to assist human association, reconstruction and localization.<n>A Human Head fuses cross-view features and scene context for SMPL prediction, guided by cross-view reprojection losses to enforce body pose consistency.
- Score: 81.50960055126156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing 3D humans from images captured at multiple perspectives typically requires pre-calibration, like using checkerboards or MVS algorithms, which limits scalability and applicability in diverse real-world scenarios. In this work, we present \textbf{AHAP} (Reconstructing \textbf{A}rbitrary \textbf{H}umans from \textbf{A}rbitrary \textbf{P}erspectives), a feed-forward framework for reconstructing arbitrary humans from arbitrary camera perspectives without requiring camera calibration. Our core lies in the effective fusion of multi-view geometry to assist human association, reconstruction and localization. Specifically, we use a Cross-View Identity Association module through learnable person queries and soft assignment, supervised by contrastive learning to resolve cross-view human identity association. A Human Head fuses cross-view features and scene context for SMPL prediction, guided by cross-view reprojection losses to enforce body pose consistency. Additionally, multi-view geometry eliminates the depth ambiguity inherent in monocular methods, providing more precise 3D human localization through multi-view triangulation. Experiments on EgoHumans and EgoExo4D demonstrate that AHAP achieves competitive performance on both world-space human reconstruction and camera pose estimation, while being 180$\times$ faster than optimization-based approaches.
Related papers
- InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting [64.42884719282323]
InpaintHuman is a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos.<n>Our approach employs direct pixel-level supervision to ensure identity fidelity.
arXiv Detail & Related papers (2026-01-05T13:26:02Z) - HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction [15.368018463074058]
HAMSt3R is an extension of MASt3R for joint human and scene 3D reconstruction from sparse, uncalibrated images.<n>Our method incorporates additional network heads to segment people, estimate dense correspondences via DensePose, and predict depth in human-centric environments.
arXiv Detail & Related papers (2025-08-22T14:43:18Z) - PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images [23.745241278910946]
PF-LHM is a large human reconstruction model that generates high-quality 3D avatars in seconds from one or multiple casually captured pose-free images.<n>Our method unifies single- and multi-image 3D human reconstruction, achieving high-fidelity and animatable 3D human avatars without requiring camera and human pose annotations.
arXiv Detail & Related papers (2025-06-16T17:59:56Z) - CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image [37.16845070245751]
We present a novel pipeline to reconstruct 3D humans with multiview consistency from a single occluded image.<n>A 3D reconstruction model is then trained to predict a set of 3D Gaussians conditioned on both the occluded input and synthesized views.<n> achieves significant improvements in terms of both novel view synthesis (upto 3 db PSNR) and geometric reconstruction under challenging conditions.
arXiv Detail & Related papers (2025-03-19T19:56:18Z) - Reconstructing People, Places, and Cameras [57.81696692335401]
"Humans and Structure from Motion" (HSfM) is a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system.<n>Our results show that incorporating human data into the SfM pipeline improves camera pose estimation.
arXiv Detail & Related papers (2024-12-23T18:58:34Z) - RCR: Robust Crowd Reconstruction with Upright Space from a Single Large-scene Image [55.77397543011443]
This paper focuses on spatially consistent hundreds of human pose and shape reconstruction from a single large-scene image.<n>We first propose a new concept, Human-scene Virtual Interaction Point (HVIP), to convert the complex 3D human localization into 2D-pixel localization.<n>We then extend it to RCR (Robust Crowd Reconstruction), which achieves globally consistent reconstruction and stable generalization on different camera FoVs.
arXiv Detail & Related papers (2024-11-09T16:49:59Z) - Body Size and Depth Disambiguation in Multi-Person Reconstruction from
Single Images [44.96633481495911]
We address the problem of multi-person 3D body pose and shape estimation from a single image.
We devise a novel optimization scheme that learns the appropriate body scale and relative camera pose, by enforcing the feet of all people to remain on the ground floor.
A thorough evaluation on MuPoTS-3D and 3DPW datasets demonstrates that our approach is able to robustly estimate the body translation and shape of multiple people while retrieving their spatial arrangement.
arXiv Detail & Related papers (2021-11-02T20:42:41Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.