Related papers: Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization

Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization

URL: http://arxiv.org/abs/2305.19135v1
Date: Tue, 30 May 2023 15:46:25 GMT
Title: Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization
Authors: Doyeon Kim, Eunji Ko, Hyunsu Kim, Yunji Kim, Junho Kim, Dongchan Min, Junmo Kim, Sung Ju Hwang
Abstract summary: We propose a novel two-stage video translation framework with an objective function which enforces a model to generate a temporally coherent stylized video. Our model runs in real-time with the latency of 0.011 seconds per frame and requires only 5.6M parameters.
Score: 68.10073215175055
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Portrait stylization, which translates a real human face image into an artistically stylized image, has attracted considerable interest and many prior works have shown impressive quality in recent years. However, despite their remarkable performances in the image-level translation tasks, prior methods show unsatisfactory results when they are applied to the video domain. To address the issue, we propose a novel two-stage video translation framework with an objective function which enforces a model to generate a temporally coherent stylized video while preserving context in the source video. Furthermore, our model runs in real-time with the latency of 0.011 seconds per frame and requires only 5.6M parameters, and thus is widely applicable to practical real-world applications.

Related papers

Video Virtual Try-on with Conditional Diffusion Transformer Inpainter [27.150975905047968]
Video virtual try-on aims to fit a garment to a target person in consecutive video frames.<n>Recent diffusion-based video try-on methods, though very few, happen to coincide with a similar solution.<n>We propose ViTI (Video Try-on Inpainter), formulate and implement video virtual try-on as a conditional video inpainting task.
arXiv Detail & Related papers (2025-06-26T13:56:27Z)
DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds [64.53681498600065]
DreamDance is an animation framework capable of producing stable, consistent character and scene motion conditioned on precise camera trajectories.<n>We train a pose-aware video inpainting model that injects the dynamic character into the scene video while enhancing background quality.
arXiv Detail & Related papers (2025-05-30T15:54:34Z)
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation [31.751046895654444]
We introduce design enhancements to Hallo to produce long-duration videos. We achieve 4K resolution portrait video generation. We incorporate adjustable semantic textual labels for portrait expressions as conditional inputs.
arXiv Detail & Related papers (2024-10-10T08:34:41Z)
Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control [77.08568533331206]
Follow-Your-Pose v2 can be trained on noisy open-sourced videos readily available on the internet. Our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics.
arXiv Detail & Related papers (2024-06-05T08:03:18Z)
UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation. We map the reference image along with the posture guidance and noise video into a common feature space. We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z)
LatentMan: Generating Consistent Animated Characters using Image Diffusion Models [44.18315132571804]
We propose a zero-shot approach for generating consistent videos of animated characters based on Text-to-Image (T2I) diffusion models. Our proposed approach outperforms existing zero-shot T2V approaches in generating videos of animated characters in terms of pixel-wise consistency and user preference.
arXiv Detail & Related papers (2023-12-12T10:07:37Z)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z)
WAIT: Feature Warping for Animation to Illustration video Translation using GANs [12.681919619814419]
We introduce a new problem for video stylizing where an unordered set of images are used. Most of the video-to-video translation methods are built on an image-to-image translation model. We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
arXiv Detail & Related papers (2023-10-07T19:45:24Z)
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos. Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z)
Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages. We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z)
Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation [99.64565200170897]
We propose a novel human video synthesis method by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. We show several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-01-14T18:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.