Copy Motion From One to Another: Fake Motion Video Generation
- URL: http://arxiv.org/abs/2205.01373v1
- Date: Tue, 3 May 2022 08:45:22 GMT
- Title: Copy Motion From One to Another: Fake Motion Video Generation
- Authors: Zhenguang Liu, Sifan Wu, Chejian Xu, Xiang Wang, Lei Zhu, Shuang Wu,
Fuli Feng
- Abstract summary: A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion.
Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos.
We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image.
Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
- Score: 53.676020148034034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One compelling application of artificial intelligence is to generate a video
of a target person performing arbitrary desired motion (from a source person).
While the state-of-the-art methods are able to synthesize a video demonstrating
similar broad stroke motion details, they are generally lacking in texture
details. A pertinent manifestation appears as distorted face, feet, and hands,
and such flaws are very sensitively perceived by human observers. Furthermore,
current methods typically employ GANs with a L2 loss to assess the authenticity
of the generated videos, inherently requiring a large amount of training
samples to learn the texture details for adequate video generation. In this
work, we tackle these challenges from three aspects: 1) We disentangle each
video frame into foreground (the person) and background, focusing on generating
the foreground to reduce the underlying dimension of the network output. 2) We
propose a theoretically motivated Gromov-Wasserstein loss that facilitates
learning the mapping from a pose to a foreground image. 3) To enhance texture
details, we encode facial features with geometric guidance and employ local
GANs to refine the face, feet, and hands. Extensive experiments show that our
method is able to generate realistic target person videos, faithfully copying
complex motions from a source person. Our code and datasets are released at
https://github.com/Sifann/FakeMotion
Related papers
- Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision.
Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video.
We present an episodic memory module in the pose-to-appearance generation to propel continuous learning.
Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z) - Splatter a Video: Video Gaussian Representation for Versatile Processing [48.9887736125712]
Video representation is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing.
We introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians.
It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation.
arXiv Detail & Related papers (2024-06-19T22:20:03Z) - Synthesizing Moving People with 3D Control [88.68284137105654]
We present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence.
For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image.
Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses.
arXiv Detail & Related papers (2024-01-19T18:59:11Z) - DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head
Video Generation [18.511092587156657]
We present a novel self-supervised method for learning dense 3D facial geometry from face videos.
We also propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning.
We develop a 3D-aware cross-modal (ie, appearance and depth) attention mechanism to capture facial geometries in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-05-10T14:58:33Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Depth-Aware Generative Adversarial Network for Talking Head Video
Generation [15.43672834991479]
Talking head video generation aims to produce a synthetic human face video that contains the identity and pose information respectively from a given source image and a driving video.
Existing works for this task heavily rely on 2D representations (e.g. appearance and motion) learned from the input images.
In this paper, we introduce a self-supervised geometry learning method to automatically recover the dense 3D geometry (i.e.depth) from the face videos.
arXiv Detail & Related papers (2022-03-13T09:32:22Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Detecting Deepfake Videos Using Euler Video Magnification [1.8506048493564673]
Deepfake videos are manipulating videos using advanced machine learning techniques.
In this paper, we examine a technique for possible identification of deepfake videos.
Our approach uses features extracted from the Euler technique to train three models to classify counterfeit and unaltered videos.
arXiv Detail & Related papers (2021-01-27T17:37:23Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.