Do as we do: Multiple Person Video-To-Video Transfer
- URL: http://arxiv.org/abs/2104.04721v1
- Date: Sat, 10 Apr 2021 09:26:31 GMT
- Title: Do as we do: Multiple Person Video-To-Video Transfer
- Authors: Mickael Cormier, Houraalsadat Mortazavi Moshkenan, Franz L\"orch,
J\"urgen Metzler, J\"urgen Beyerer
- Abstract summary: We propose a marker-less approach for multiple-person video-to-video transfer using pose as an intermediate representation.
Given a source video with multiple persons dancing or working out, our method transfers the body motion of all actors to a new set of actors in a different video.
Our method is able to convincingly transfer body motion to the target video, while preserving specific features of the target video, such as feet touching the floor and relative position of the actors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our goal is to transfer the motion of real people from a source video to a
target video with realistic results. While recent advances significantly
improved image-to-image translations, only few works account for body motions
and temporal consistency. However, those focus only on video re-targeting for a
single actor/ for single actors. In this work, we propose a marker-less
approach for multiple-person video-to-video transfer using pose as an
intermediate representation. Given a source video with multiple persons dancing
or working out, our method transfers the body motion of all actors to a new set
of actors in a different video. Differently from recent "do as I do" methods,
we focus specifically on transferring multiple person at the same time and
tackle the related identity switch problem. Our method is able to convincingly
transfer body motion to the target video, while preserving specific features of
the target video, such as feet touching the floor and relative position of the
actors. The evaluation is performed with visual quality and appearance metrics
using publicly available videos with the permission of their owners.
Related papers
- Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion.
Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input.
By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity.
Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks.
arXiv Detail & Related papers (2024-08-01T10:55:20Z) - MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware
Meta-learning [51.78302763617991]
Dancing video aims to synthesize a video that transfers the dance movements from a source video to a target person.
Previous work need collect a several-minute-long video of a target person with thousands of frames to train a personalized model.
Recent work tackled few-shot dancing video, which learns to synthesize videos of unseen persons by leveraging a few frames of them.
arXiv Detail & Related papers (2022-01-13T09:34:20Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - On Development and Evaluation of Retargeting Human Motion and Appearance
in Monocular Videos [2.870762512009438]
Transferring human motion and appearance between videos of human actors remains one of the key challenges in Computer Vision.
We propose a novel and high-performant approach based on a hybrid image-based rendering technique that exhibits competitive visual quality.
We also present a new video benchmark dataset composed of different videos with annotated human motions to evaluate the task of synthesizing people's videos.
arXiv Detail & Related papers (2021-03-29T13:17:41Z) - Layered Neural Rendering for Retiming People in Video [108.85428504808318]
We present a method for retiming people in an ordinary, natural video.
We can temporally align different motions, change the speed of certain actions, or "erase" selected people from the video altogether.
A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate.
arXiv Detail & Related papers (2020-09-16T17:48:26Z) - Motion Capture from Internet Videos [47.943209721329474]
Recent advances in image-based human pose estimation make it possible to capture 3D human motion from a single RGB video.
While multi-view videos are not common, the videos of a celebrity performing a specific action are usually abundant on the Internet.
We propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos.
arXiv Detail & Related papers (2020-08-18T13:48:37Z) - ReenactNet: Real-time Full Head Reenactment [50.32988828989691]
We propose a head-to-head system capable of fully transferring the human head 3D pose, facial expressions and eye gaze from a source to a target actor.
Our system produces high-fidelity, temporally-smooth and photo-realistic synthetic videos faithfully transferring the human time-varying head attributes from the source to the target actor.
arXiv Detail & Related papers (2020-05-22T00:51:38Z) - TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting [107.39743751292028]
TransMoMo is capable of transferring motion of a person in a source video realistically to another video of a target person.
We exploit invariance properties of three factors of variation including motion, structure, and view-angle.
We demonstrate the effectiveness of our method over the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T17:49:53Z) - Do As I Do: Transferring Human Motion and Appearance between Monocular
Videos with Spatial and Temporal Constraints [8.784162652042959]
Marker-less human motion estimation and shape modeling from images in the wild bring this challenge to the fore.
We propose a unifying formulation for transferring appearance and human motion from monocular videos.
Our method is able to transfer both human motion and appearance outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T16:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.