Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling
- URL: http://arxiv.org/abs/2602.19089v1
- Date: Sun, 22 Feb 2026 08:07:28 GMT
- Title: Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling
- Authors: Qi Sun, Can Wang, Jiaxiang Shang, Yingchun Liu, Jing Liao,
- Abstract summary: We present Ani3DHuman, a framework that marries kinematics-based animation with video diffusion priors.<n>We first introduce a layered motion representation that disentangles rigid motion from residual non-rigid motion.<n>We propose a novel self-guided photorealistic sampling method, which effectively addresses the out-of-distribution problem.
- Score: 20.88616874056278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current 3D human animation methods struggle to achieve photorealism: kinematics-based approaches lack non-rigid dynamics (e.g., clothing dynamics), while methods that leverage video diffusion priors can synthesize non-rigid motion but suffer from quality artifacts and identity loss. To overcome these limitations, we present Ani3DHuman, a framework that marries kinematics-based animation with video diffusion priors. We first introduce a layered motion representation that disentangles rigid motion from residual non-rigid motion. Rigid motion is generated by a kinematic method, which then produces a coarse rendering to guide the video diffusion model in generating video sequences that restore the residual non-rigid motion. However, this restoration task, based on diffusion sampling, is highly challenging, as the initial renderings are out-of-distribution, causing standard deterministic ODE samplers to fail. Therefore, we propose a novel self-guided stochastic sampling method, which effectively addresses the out-of-distribution problem by combining stochastic sampling (for photorealistic quality) with self-guidance (for identity fidelity). These restored videos provide high-quality supervision, enabling the optimization of the residual non-rigid motion field. Extensive experiments demonstrate that \MethodName can generate photorealistic 3D human animation, outperforming existing methods. Code is available in https://github.com/qiisun/ani3dhuman.
Related papers
- Object-Aware 4D Human Motion Generation [20.338809521456298]
We propose an object-aware 4D human motion generation framework grounded in 3D Gaussian representations and motion diffusion priors.<n>Our framework produces natural and physically plausible human motions that respect 3D spatial context.
arXiv Detail & Related papers (2025-10-31T20:40:17Z) - ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory [56.06314177428745]
We present ManipDreamer3D for generating plausible 3D-aware robotic manipulation videos from the input image and the text instruction.<n>Our method generates robotic videos with autonomously planned 3D trajectories, significantly reducing human intervention requirements.
arXiv Detail & Related papers (2025-08-29T10:39:06Z) - Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video [56.781766315691854]
We introduce textbfRestage4D, a geometry-preserving pipeline for video-conditioned 4D restaging.<n>We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance.
arXiv Detail & Related papers (2025-08-08T21:31:51Z) - Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models [71.78723353724493]
Animation of humanoid characters is essential in various graphics applications.<n>We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes.
arXiv Detail & Related papers (2025-03-20T10:00:22Z) - Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos [64.10307207290039]
We introduce a novel framework for modeling high-fidelity, animatable 3D human avatars from motion-blurred monocular video inputs.<n>By explicitly modeling human motion trajectories during exposure time, we jointly optimize the trajectories and 3D Gaussians to reconstruct sharp, high-quality human avatars.
arXiv Detail & Related papers (2025-01-23T02:31:57Z) - DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos [20.895221536570627]
Human mesh recovery (HMR) provides rich human body information for various real-world applications.<n>Video-based approaches leverage temporal information to mitigate this issue.<n>We present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR.
arXiv Detail & Related papers (2023-03-23T16:15:18Z) - Neural Face Models for Example-Based Visual Speech Synthesis [2.2817442144155207]
We present a marker-less approach for facial motion capture based on multi-view video.
We learn a neural representation of facial expressions, which is used to seamlessly facial performances during the animation procedure.
arXiv Detail & Related papers (2020-09-22T07:35:33Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.