Related papers: HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics

HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics

URL: http://arxiv.org/abs/2508.09858v1
Date: Wed, 13 Aug 2025 14:50:19 GMT
Title: HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics
Authors: Weiqi Li, Zehao Zhang, Liang Lin, Guangrun Wang,
Abstract summary: We present textbfHumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents.<n>HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization.
Score: 60.737929335600015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: \textbf{Synthetic human dynamics} aims to generate photorealistic videos of human subjects performing expressive, intention-driven motions. However, current approaches face two core challenges: (1) \emph{geometric inconsistency} and \emph{coarse reconstruction}, due to limited 3D modeling and detail preservation; and (2) \emph{motion generalization limitations} and \emph{scene inharmonization}, stemming from weak generative capabilities. To address these, we present \textbf{HumanGenesis}, a framework that integrates geometric and generative modeling through four collaborative agents: (1) \textbf{Reconstructor} builds 3D-consistent human-scene representations from monocular video using 3D Gaussian Splatting and deformation decomposition. (2) \textbf{Critique Agent} enhances reconstruction fidelity by identifying and refining poor regions via multi-round MLLM-based reflection. (3) \textbf{Pose Guider} enables motion generalization by generating expressive pose sequences using time-aware parametric encoders. (4) \textbf{Video Harmonizer} synthesizes photorealistic, coherent video via a hybrid rendering pipeline with diffusion, refining the Reconstructor through a Back-to-4D feedback loop. HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization, significantly improving expressiveness, geometric fidelity, and scene integration.

Related papers

Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting [36.3168821104293]
3D human reconstruction from a single image is a challenging problem.<n>We present emphMVD-HuGaS, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model.
arXiv Detail & Related papers (2026-03-03T11:44:46Z)
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation [45.753757870577196]
We introduce AGILE, a robust framework that shifts the paradigm from reconstruction to agentic generation for interaction learning.<n>We show that AGILE outperforms baselines in global geometric accuracy while demonstrating exceptional robustness on challenging sequences where prior art frequently collapses.
arXiv Detail & Related papers (2026-02-04T15:42:58Z)
InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting [64.42884719282323]
InpaintHuman is a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos.<n>Our approach employs direct pixel-level supervision to ensure identity fidelity.
arXiv Detail & Related papers (2026-01-05T13:26:02Z)
Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement [6.91111219679588]
Blur2Sharp is a novel framework integrating 3D-aware neural rendering and diffusion models to generate sharp, geometrically consistent novel-view images.<n>Our method employs a dual-conditioning architecture: first, a Human NeRF model generates geometrically coherent multi-view renderings for target poses, explicitly encoding 3D structural guidance.<n>We further enhance visual quality through hierarchical feature fusion, incorporating texture, normal, and semantic priors extracted from parametric SMPL models to simultaneously improve global coherence and local detail accuracy.
arXiv Detail & Related papers (2025-12-09T03:49:12Z)
4D Driving Scene Generation With Stereo Forcing [62.47705572424127]
Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization.<n>We present PhiGenesis, a unified framework for 4D scene generation that extends video generation techniques with geometric and temporal consistency.
arXiv Detail & Related papers (2025-09-24T15:37:17Z)
Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors [22.797709893040906]
GenMOJO is a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis.<n>It decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object.<n>The resulting model generates 4D object reconstructions over space and time, and produces accurate 2D and 3D point tracks from monocular input.
arXiv Detail & Related papers (2025-06-15T04:40:20Z)
DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers [60.86393841247567]
HumanRAM is a novel feed-forward approach for generalizable human reconstruction and animation from monocular or sparse human images.<n>Our approach integrates human reconstruction and animation into a unified framework by introducing explicit pose conditions.<n> Experiments show that HumanRAM significantly surpasses previous methods in terms of reconstruction accuracy, animation fidelity, and generalization performance on real-world datasets.
arXiv Detail & Related papers (2025-06-03T17:50:05Z)
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [29.03216532351979]
We introduce textbfHumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline.<n>In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority.<n>We also propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view.
arXiv Detail & Related papers (2025-04-04T15:35:14Z)
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.<n>voxelization infers per-object occupancy probabilities at individual spatial locations.<n>Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
Video-driven Neural Physically-based Facial Asset for Production [33.24654834163312]
We present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. Our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
arXiv Detail & Related papers (2022-02-11T13:22:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.