AMG: Avatar Motion Guided Video Generation
- URL: http://arxiv.org/abs/2409.01502v1
- Date: Mon, 2 Sep 2024 23:59:01 GMT
- Title: AMG: Avatar Motion Guided Video Generation
- Authors: Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang,
- Abstract summary: We propose AMG, a method that combines the 2D photorealism and 3D controllability by conditioning video diffusion models on controlled rendering of 3D avatars.
AMG is the first method that enables multi-person diffusion video generation with precise control over camera positions, human motions, and background style.
- Score: 5.82136706118236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware control; whereas 3D avatar-based approaches, while offering more freedom in control, lack photorealism and cannot be harmonized seamlessly with background scene. We propose AMG, a method that combines the 2D photorealism and 3D controllability by conditioning video diffusion models on controlled rendering of 3D avatars. We additionally introduce a novel data processing pipeline that reconstructs and renders human avatar movements from dynamic camera videos. AMG is the first method that enables multi-person diffusion video generation with precise control over camera positions, human motions, and background style. We also demonstrate through extensive evaluation that it outperforms existing human video generation methods conditioned on pose sequences or driving videos in terms of realism and adaptability.
Related papers
- WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis.
Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z) - Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos [64.10307207290039]
De-Avatar is a novel framework for modeling high-fidelity, animatable 3D human avatars from motion-blurred monocular video inputs.
arXiv Detail & Related papers (2025-01-23T02:31:57Z) - Move-in-2D: 2D-Conditioned Human Motion Generation [54.067588636155115]
We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image.
Our approach accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene.
arXiv Detail & Related papers (2024-12-17T18:58:07Z) - HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation.
For the real-world data, we compile a vast collection of real-world videos from the internet.
For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z) - Physically Plausible Animation of Human Upper Body from a Single Image [41.027391105867345]
We present a new method for generating controllable, dynamically responsive, and photorealistic human animations.
Given an image of a person, our system allows the user to generate Physically plausible Upper Body Animation (PUBA) using interaction in the image space.
arXiv Detail & Related papers (2022-12-09T09:36:59Z) - AvatarGen: a 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is the first method that enables not only non-rigid human generation with diverse appearance but also full control over poses and viewpoints.
To model non-rigid dynamics, it introduces a deformation network to learn pose-dependent deformations in the canonical space.
Our method can generate animatable human avatars with high-quality appearance and geometry modeling, significantly outperforming previous 3D GANs.
arXiv Detail & Related papers (2022-08-01T01:27:02Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - Action2video: Generating Videos of Human 3D Actions [31.665831044217363]
We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories.
Key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances.
Action2motionally generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
arXiv Detail & Related papers (2021-11-12T20:20:37Z) - 4D Human Body Capture from Egocentric Video via 3D Scene Grounding [38.3169520384642]
We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos.
The unique viewpoint and rapid embodied camera motion of egocentric videos raise additional technical barriers for human body capture.
arXiv Detail & Related papers (2020-11-26T15:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.