Context-Preserving Two-Stage Video Domain Translation for Portrait
  Stylization
        - URL: http://arxiv.org/abs/2305.19135v1
- Date: Tue, 30 May 2023 15:46:25 GMT
- Title: Context-Preserving Two-Stage Video Domain Translation for Portrait
  Stylization
- Authors: Doyeon Kim, Eunji Ko, Hyunsu Kim, Yunji Kim, Junho Kim, Dongchan Min,
  Junmo Kim, Sung Ju Hwang
- Abstract summary: We propose a novel two-stage video translation framework with an objective function which enforces a model to generate a temporally coherent stylized video.
Our model runs in real-time with the latency of 0.011 seconds per frame and requires only 5.6M parameters.
- Score: 68.10073215175055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Portrait stylization, which translates a real human face image into an
artistically stylized image, has attracted considerable interest and many prior
works have shown impressive quality in recent years. However, despite their
remarkable performances in the image-level translation tasks, prior methods
show unsatisfactory results when they are applied to the video domain. To
address the issue, we propose a novel two-stage video translation framework
with an objective function which enforces a model to generate a temporally
coherent stylized video while preserving context in the source video.
Furthermore, our model runs in real-time with the latency of 0.011 seconds per
frame and requires only 5.6M parameters, and thus is widely applicable to
practical real-world applications.
 
      
        Related papers
        - Video Virtual Try-on with Conditional Diffusion Transformer Inpainter [27.150975905047968]
 Video virtual try-on aims to fit a garment to a target person in consecutive video frames.<n>Recent diffusion-based video try-on methods, though very few, happen to coincide with a similar solution.<n>We propose ViTI (Video Try-on Inpainter), formulate and implement video virtual try-on as a conditional video inpainting task.
 arXiv  Detail & Related papers  (2025-06-26T13:56:27Z)
- DreamDance: Animating Character Art via Inpainting Stable Gaussian   Worlds [64.53681498600065]
 DreamDance is an animation framework capable of producing stable, consistent character and scene motion conditioned on precise camera trajectories.<n>We train a pose-aware video inpainting model that injects the dynamic character into the scene video while enhancing background quality.
 arXiv  Detail & Related papers  (2025-05-30T15:54:34Z)
- Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image   Animation [31.751046895654444]
 We introduce design enhancements to Hallo to produce long-duration videos.
We achieve 4K resolution portrait video generation.
We incorporate adjustable semantic textual labels for portrait expressions as conditional inputs.
 arXiv  Detail & Related papers  (2024-10-10T08:34:41Z)
- Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation   for Stable Pose Control [77.08568533331206]
 Follow-Your-Pose v2 can be trained on noisy open-sourced videos readily available on the internet.
Our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics.
 arXiv  Detail & Related papers  (2024-06-05T08:03:18Z)
- UniAnimate: Taming Unified Video Diffusion Models for Consistent Human   Image Animation [53.16986875759286]
 We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
 arXiv  Detail & Related papers  (2024-06-03T10:51:10Z)
- LatentMan: Generating Consistent Animated Characters using Image   Diffusion Models [44.18315132571804]
 We propose a zero-shot approach for generating consistent videos of animated characters based on Text-to-Image (T2I) diffusion models.
Our proposed approach outperforms existing zero-shot T2V approaches in generating videos of animated characters in terms of pixel-wise consistency and user preference.
 arXiv  Detail & Related papers  (2023-12-12T10:07:37Z)
- MagicAnimate: Temporally Consistent Human Image Animation using
  Diffusion Model [74.84435399451573]
 This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
 arXiv  Detail & Related papers  (2023-11-27T18:32:31Z)
- WAIT: Feature Warping for Animation to Illustration video Translation
  using GANs [12.681919619814419]
 We introduce a new problem for video stylizing where an unordered set of images are used.
Most of the video-to-video translation methods are built on an image-to-image translation model.
We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
 arXiv  Detail & Related papers  (2023-10-07T19:45:24Z)
- Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
 This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
 arXiv  Detail & Related papers  (2023-06-13T17:52:23Z)
- Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
 We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
 arXiv  Detail & Related papers  (2022-08-11T02:57:30Z)
- Neural Human Video Rendering by Learning Dynamic Textures and
  Rendering-to-Video Translation [99.64565200170897]
 We propose a novel human video synthesis method by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space.
We show several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
 arXiv  Detail & Related papers  (2020-01-14T18:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.