Context-Preserving Two-Stage Video Domain Translation for Portrait
Stylization
- URL: http://arxiv.org/abs/2305.19135v1
- Date: Tue, 30 May 2023 15:46:25 GMT
- Title: Context-Preserving Two-Stage Video Domain Translation for Portrait
Stylization
- Authors: Doyeon Kim, Eunji Ko, Hyunsu Kim, Yunji Kim, Junho Kim, Dongchan Min,
Junmo Kim, Sung Ju Hwang
- Abstract summary: We propose a novel two-stage video translation framework with an objective function which enforces a model to generate a temporally coherent stylized video.
Our model runs in real-time with the latency of 0.011 seconds per frame and requires only 5.6M parameters.
- Score: 68.10073215175055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Portrait stylization, which translates a real human face image into an
artistically stylized image, has attracted considerable interest and many prior
works have shown impressive quality in recent years. However, despite their
remarkable performances in the image-level translation tasks, prior methods
show unsatisfactory results when they are applied to the video domain. To
address the issue, we propose a novel two-stage video translation framework
with an objective function which enforces a model to generate a temporally
coherent stylized video while preserving context in the source video.
Furthermore, our model runs in real-time with the latency of 0.011 seconds per
frame and requires only 5.6M parameters, and thus is widely applicable to
practical real-world applications.
Related papers
- Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation [31.751046895654444]
We introduce design enhancements to Hallo to produce long-duration videos.
We achieve 4K resolution portrait video generation.
We incorporate adjustable semantic textual labels for portrait expressions as conditional inputs.
arXiv Detail & Related papers (2024-10-10T08:34:41Z) - Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control [77.08568533331206]
Follow-Your-Pose v2 can be trained on noisy open-sourced videos readily available on the internet.
Our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics.
arXiv Detail & Related papers (2024-06-05T08:03:18Z) - UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - WAIT: Feature Warping for Animation to Illustration video Translation
using GANs [12.681919619814419]
We introduce a new problem for video stylizing where an unordered set of images are used.
Most of the video-to-video translation methods are built on an image-to-image translation model.
We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.