Unsupervised Coherent Video Cartoonization with Perceptual Motion
Consistency
- URL: http://arxiv.org/abs/2204.00795v1
- Date: Sat, 2 Apr 2022 07:59:02 GMT
- Title: Unsupervised Coherent Video Cartoonization with Perceptual Motion
Consistency
- Authors: Zhenhuan Liu, Liang Li, Huajie Jiang, Xin Jin, Dandan Tu, Shuhui Wang,
Zheng-Jun Zha
- Abstract summary: We propose a spatially-adaptive alignment framework with perceptual motion consistency for coherent video cartoonization.
We devise the semantic correlative map as a style-independent, global-aware regularization on the perceptual consistency motion.
Our method is able to generate highly stylistic and temporal consistent cartoon videos.
- Score: 89.75731026852338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, creative content generations like style transfer and neural
photo editing have attracted more and more attention. Among these,
cartoonization of real-world scenes has promising applications in entertainment
and industry. Different from image translations focusing on improving the style
effect of generated images, video cartoonization has additional requirements on
the temporal consistency. In this paper, we propose a spatially-adaptive
semantic alignment framework with perceptual motion consistency for coherent
video cartoonization in an unsupervised manner. The semantic alignment module
is designed to restore deformation of semantic structure caused by spatial
information lost in the encoder-decoder architecture. Furthermore, we devise
the spatio-temporal correlative map as a style-independent, global-aware
regularization on the perceptual motion consistency. Deriving from similarity
measurement of high-level features in photo and cartoon frames, it captures
global semantic information beyond raw pixel-value in optical flow. Besides,
the similarity measurement disentangles temporal relationships from
domain-specific style properties, which helps regularize the temporal
consistency without hurting style effects of cartoon images. Qualitative and
quantitative experiments demonstrate our method is able to generate highly
stylistic and temporal consistent cartoon videos.
Related papers
- Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation [19.408715783816167]
We introduce a training-free framework that ensures the generated video sequence preserves the reference image's subtleties.
We decouple skeletal and motion priors from pose information, enabling precise control over animation generation.
Our method significantly enhances the quality of video generation without the need for large datasets or expensive computational resources.
arXiv Detail & Related papers (2024-08-29T13:08:12Z) - OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance [13.050998759819933]
"OpFlowTalker" is a novel approach that utilizes predicted optical flow changes from audio inputs rather than direct image predictions.
It smooths image transitions and aligns changes with semantic content.
We also developed an optical flow synchronization module that regulates both full-face and lip movements.
arXiv Detail & Related papers (2024-05-23T15:42:34Z) - AniClipart: Clipart Animation with Text-to-Video Priors [28.76809141136148]
We introduce AniClipart, a system that transforms static images into high-quality motion sequences guided by text-to-video priors.
Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models.
arXiv Detail & Related papers (2024-04-18T17:24:28Z) - Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation [85.29772293776395]
We introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.
This enhancement ensures a more consistent transformation of semantically similar content across frames.
Our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video.
arXiv Detail & Related papers (2024-03-19T17:59:18Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z) - Latent Image Animator: Learning to Animate Images via Latent Space
Navigation [11.286071873122658]
We introduce the Latent Image Animator (LIA), a self-supervised autoencoder that evades need for structure representation.
LIA is streamlined to animate images by linear navigation in the latent space. Specifically, motion in generated video is constructed by linear displacement of codes in the latent space.
arXiv Detail & Related papers (2022-03-17T02:45:34Z) - Image Morphing with Perceptual Constraints and STN Alignment [70.38273150435928]
We propose a conditional GAN morphing framework operating on a pair of input images.
A special training protocol produces sequences of frames, combined with a perceptual similarity loss, promote smooth transformation over time.
We provide comparisons to classic as well as latent space morphing techniques, and demonstrate that, given a set of images for self-supervision, our network learns to generate visually pleasing morphing effects.
arXiv Detail & Related papers (2020-04-29T10:49:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.