Parametric Reshaping of Portraits in Videos
- URL: http://arxiv.org/abs/2205.02538v1
- Date: Thu, 5 May 2022 09:55:16 GMT
- Title: Parametric Reshaping of Portraits in Videos
- Authors: Xiangjun Tang, Wenxin Sun, Yong-Liang Yang, and Xiaogang Jin
- Abstract summary: We present a robust and easy-to-use parametric method to reshape the portrait in a video to produce smooth retouched results.
Given an input portrait video, our method consists of two main stages: stabilized face reconstruction, and continuous video reshaping.
In the second stage, we first reshape the reconstructed 3D face using a parametric reshaping model reflecting the weight change of the face, and then utilize the reshaped 3D face to guide the warping of video frames.
- Score: 24.428095383264456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sharing short personalized videos to various social media networks has become
quite popular in recent years. This raises the need for digital retouching of
portraits in videos. However, applying portrait image editing directly on
portrait video frames cannot generate smooth and stable video sequences. To
this end, we present a robust and easy-to-use parametric method to reshape the
portrait in a video to produce smooth retouched results. Given an input
portrait video, our method consists of two main stages: stabilized face
reconstruction, and continuous video reshaping. In the first stage, we start by
estimating face rigid pose transformations across video frames. Then we jointly
optimize multiple frames to reconstruct an accurate face identity, followed by
recovering face expressions over the entire video. In the second stage, we
first reshape the reconstructed 3D face using a parametric reshaping model
reflecting the weight change of the face, and then utilize the reshaped 3D face
to guide the warping of video frames. We develop a novel signed distance
function based dense mapping method for the warping between face contours
before and after reshaping, resulting in stable warped video frames with
minimum distortions. In addition, we use the 3D structure of the face to
correct the dense mapping to achieve temporal consistency. We generate the
final result by minimizing the background distortion through optimizing a
content-aware warping mesh. Extensive experiments show that our method is able
to create visually pleasing results by adjusting a simple reshaping parameter,
which facilitates portrait video editing for social media and visual effects.
Related papers
- GenDeF: Learning Generative Deformation Field for Video Generation [89.49567113452396]
We propose to render a video by warping one static image with a generative deformation field (GenDeF)
Such a pipeline enjoys three appealing advantages.
arXiv Detail & Related papers (2023-12-07T18:59:41Z) - A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized
Optimization [17.938604013181426]
We propose NeuFace, a 3D face mesh pseudo annotation method on videos.
We annotate the per-view/frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset.
By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks.
arXiv Detail & Related papers (2023-10-04T23:24:22Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Video2StyleGAN: Encoding Video in Latent Space for Manipulation [63.03250800510085]
We propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Our approach can significantly outperform existing single image methods, while achieving real-time (66 fps) speed.
arXiv Detail & Related papers (2022-06-27T06:48:15Z) - Learning to Deblur and Rotate Motion-Blurred Faces [43.673660541417995]
We train a neural network to reconstruct a 3D video representation from a single image and the corresponding face gaze.
We then provide a camera viewpoint relative to the estimated gaze and the blurry image as input to an encoder-decoder network to generate a video of sharp frames with a novel camera viewpoint.
arXiv Detail & Related papers (2021-12-14T17:51:19Z) - UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video
Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN.
Our framework is designed to handle face swapping and face reenactment simultaneously.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z) - LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from
Video using Pose and Lighting Normalization [4.43316916502814]
We present a video-based learning framework for animating personalized 3D talking faces from audio.
We introduce two training-time data normalizations that significantly improve data sample efficiency.
Our method outperforms contemporary state-of-the-art audio-driven video reenactment benchmarks in terms of realism, lip-sync and visual quality scores.
arXiv Detail & Related papers (2021-06-08T08:56:40Z) - Task-agnostic Temporally Consistent Facial Video Editing [84.62351915301795]
We propose a task-agnostic, temporally consistent facial video editing framework.
Based on a 3D reconstruction model, our framework is designed to handle several editing tasks in a more unified and disentangled manner.
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2020-07-03T02:49:20Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.