QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human
Motion Animation
- URL: http://arxiv.org/abs/2203.11632v1
- Date: Tue, 22 Mar 2022 11:34:40 GMT
- Title: QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human
Motion Animation
- Authors: Yuxin Hong and Xuelin Qian and Simian Luo and Xiangyang Xue and Yanwei
Fu
- Abstract summary: This paper studies the task of conditional Human Motion Animation (cHMA)
Given a source image and a driving video, the model should animate the new frame sequence.
The key novelties come from the newly introduced three key steps: quantize, scrabble and craft.
- Score: 66.97112599818507
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the task of conditional Human Motion Animation (cHMA).
Given a source image and a driving video, the model should animate the new
frame sequence, in which the person in the source image should perform a
similar motion as the pose sequence from the driving video. Despite the success
of Generative Adversarial Network (GANs) methods in image and video synthesis,
it is still very challenging to conduct cHMA due to the difficulty in
efficiently utilizing the conditional guided information such as images or
poses, and generating images of good visual quality. To this end, this paper
proposes a novel model of learning to Quantize, Scrabble, and Craft (QS-Craft)
for conditional human motion animation. The key novelties come from the newly
introduced three key steps: quantize, scrabble and craft. Particularly, our
QS-Craft employs transformer in its structure to utilize the attention
architectures. The guided information is represented as a pose coordinate
sequence extracted from the driving videos. Extensive experiments on human
motion datasets validate the efficacy of our model.
Related papers
- HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation.
For the real-world data, we compile a vast collection of real-world videos from the internet.
For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z) - AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations.
Key technique used in our AnimateZoo is subject alignment, which includes two steps.
Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - Image Animation with Keypoint Mask [0.0]
Motion transfer is the task of synthesizing future video frames of a single source image according to the motion from a given driving video.
In this work, we extract the structure from a keypoint heatmap, without an explicit motion representation.
Then, the structures from the image and the video are extracted to warp the image according to the video, by a deep generator.
arXiv Detail & Related papers (2021-12-20T11:35:06Z) - Action2video: Generating Videos of Human 3D Actions [31.665831044217363]
We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories.
Key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances.
Action2motionally generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
arXiv Detail & Related papers (2021-11-12T20:20:37Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.