MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware
Meta-learning
- URL: http://arxiv.org/abs/2201.04851v1
- Date: Thu, 13 Jan 2022 09:34:20 GMT
- Title: MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware
Meta-learning
- Authors: Yuying Ge, Yibing Song, Ruimao Zhang and Ping Luo
- Abstract summary: Dancing video aims to synthesize a video that transfers the dance movements from a source video to a target person.
Previous work need collect a several-minute-long video of a target person with thousands of frames to train a personalized model.
Recent work tackled few-shot dancing video, which learns to synthesize videos of unseen persons by leveraging a few frames of them.
- Score: 51.78302763617991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dancing video retargeting aims to synthesize a video that transfers the dance
movements from a source video to a target person. Previous work need collect a
several-minute-long video of a target person with thousands of frames to train
a personalized model. However, the trained model can only generate videos of
the same person. To address the limitations, recent work tackled few-shot
dancing video retargeting, which learns to synthesize videos of unseen persons
by leveraging a few frames of them. In practice, given a few frames of a
person, these work simply regarded them as a batch of individual images without
temporal correlations, thus generating temporally incoherent dancing videos of
low visual quality. In this work, we model a few frames of a person as a series
of dancing moves, where each move contains two consecutive frames, to extract
the appearance patterns and the temporal dynamics of this person. We propose
MetaDance, which utilizes temporal-aware meta-learning to optimize the
initialization of a model through the synthesis of dancing moves, such that the
meta-trained model can be efficiently tuned towards enhanced visual quality and
strengthened temporal stability for unseen persons with a few frames. Extensive
evaluations show large superiority of our method.
Related papers
- Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos.
Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm.
We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z) - UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z) - WAIT: Feature Warping for Animation to Illustration video Translation
using GANs [12.681919619814419]
We introduce a new problem for video stylizing where an unordered set of images are used.
Most of the video-to-video translation methods are built on an image-to-image translation model.
We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis [124.48519390371636]
Transfering human motion from a source to a target person poses great potential in computer vision and graphics applications.
Previous work has either relied on crafted 3D human models or trained a separate model specifically for each target person.
This work studies a more general setting, in which we aim to learn a single model to parsimoniously transfer motion from a source video to any target person.
arXiv Detail & Related papers (2021-10-27T03:42:41Z) - Do as we do: Multiple Person Video-To-Video Transfer [0.0]
We propose a marker-less approach for multiple-person video-to-video transfer using pose as an intermediate representation.
Given a source video with multiple persons dancing or working out, our method transfers the body motion of all actors to a new set of actors in a different video.
Our method is able to convincingly transfer body motion to the target video, while preserving specific features of the target video, such as feet touching the floor and relative position of the actors.
arXiv Detail & Related papers (2021-04-10T09:26:31Z) - Layered Neural Rendering for Retiming People in Video [108.85428504808318]
We present a method for retiming people in an ordinary, natural video.
We can temporally align different motions, change the speed of certain actions, or "erase" selected people from the video altogether.
A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate.
arXiv Detail & Related papers (2020-09-16T17:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.