DyCrowd: Towards Dynamic Crowd Reconstruction from a Large-scene Video
- URL: http://arxiv.org/abs/2508.12644v1
- Date: Mon, 18 Aug 2025 06:09:38 GMT
- Title: DyCrowd: Towards Dynamic Crowd Reconstruction from a Large-scene Video
- Authors: Hao Wen, Hongbo Kang, Jian Ma, Jing Huang, Yuanwang Yang, Haozhe Lin, Yu-Kun Lai, Kun Li,
- Abstract summary: 3D dynamic reconstruction of crowds in large scenes has become increasingly important for applications such as city surveillance and crowd analysis.<n>We propose a framework for consistent 3D reconstruction of hundreds of individuals' poses, positions and rhythmic motion from large-scene video.<n> Experimental results demonstrate that the proposed method achieves state-of-the-art performance in the large-scene dynamic crowd reconstruction task.
- Score: 39.23308196170323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D reconstruction of dynamic crowds in large scenes has become increasingly important for applications such as city surveillance and crowd analysis. However, current works attempt to reconstruct 3D crowds from a static image, causing a lack of temporal consistency and inability to alleviate the typical impact caused by occlusions. In this paper, we propose DyCrowd, the first framework for spatio-temporally consistent 3D reconstruction of hundreds of individuals' poses, positions and shapes from a large-scene video. We design a coarse-to-fine group-guided motion optimization strategy for occlusion-robust crowd reconstruction in large scenes. To address temporal instability and severe occlusions, we further incorporate a VAE (Variational Autoencoder)-based human motion prior along with a segment-level group-guided optimization. The core of our strategy leverages collective crowd behavior to address long-term dynamic occlusions. By jointly optimizing the motion sequences of individuals with similar motion segments and combining this with the proposed Asynchronous Motion Consistency (AMC) loss, we enable high-quality unoccluded motion segments to guide the motion recovery of occluded ones, ensuring robust and plausible motion recovery even in the presence of temporal desynchronization and rhythmic inconsistencies. Additionally, in order to fill the gap of no existing well-annotated large-scene video dataset, we contribute a virtual benchmark dataset, VirtualCrowd, for evaluating dynamic crowd reconstruction from large-scene videos. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in the large-scene dynamic crowd reconstruction task. The code and dataset will be available for research purposes.
Related papers
- Masked Modeling for Human Motion Recovery Under Occlusions [21.05382087890133]
MoRo is an end-to-end generative framework that formulates motion reconstruction as a video-conditioned task.<n>MoRo achieves real-time inference at 70 FPS on a single H200 GPU.
arXiv Detail & Related papers (2026-01-22T16:22:20Z) - MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments [12.796165448365949]
MOSAIC-GS is a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos.<n>We leverage multiple geometric cues, such as depth, optical flow, dynamic object segmentation, and point tracking.<n>We demonstrate that MOSAIC-GS achieves substantially faster optimization and rendering compared to existing methods.
arXiv Detail & Related papers (2026-01-08T20:48:24Z) - 3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction [1.2603104712715607]
This paper proposes a novel 3D Gaussian point distribution method for dynamic street scene reconstruction.<n>Our approach eliminates moving objects while preserving high-fidelity static scene details.<n> Experimental results demonstrate that our method achieves high reconstruction quality, improved rendering performance, and adaptability in large-scale dynamic environments.
arXiv Detail & Related papers (2025-03-15T05:41:59Z) - EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation [14.402479944396665]
We introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to align with the target frame, and then refines it with minimal point addition/subtraction.<n> Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics.<n>Our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.
arXiv Detail & Related papers (2025-03-07T06:01:07Z) - Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z) - Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction [3.9363268745580426]
AT-GS is a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization.
We reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames.
Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis.
arXiv Detail & Related papers (2024-11-10T21:30:16Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.<n>By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.<n>We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in
the Wild [49.672487902268706]
We present a framework that jointly estimates camera temporal alignment and 3D point triangulation.
We reconstruct 3D motion trajectories of human bodies in events captured by multiple unsynchronized and unsynchronized video cameras.
arXiv Detail & Related papers (2020-07-24T23:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.