WAIT: Feature Warping for Animation to Illustration video Translation
using GANs
- URL: http://arxiv.org/abs/2310.04901v1
- Date: Sat, 7 Oct 2023 19:45:24 GMT
- Title: WAIT: Feature Warping for Animation to Illustration video Translation
using GANs
- Authors: Samet Hicsonmez, Nermin Samet, Fidan Samet, Oguz Bakir, Emre Akbas,
Pinar Duygulu
- Abstract summary: We introduce a new problem for video stylizing where an unordered set of images are used.
Most of the video-to-video translation methods are built on an image-to-image translation model.
We propose a new generator network with feature warping layers which overcomes the limitations of the previous methods.
- Score: 12.681919619814419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore a new domain for video-to-video translation.
Motivated by the availability of animation movies that are adopted from
illustrated books for children, we aim to stylize these videos with the style
of the original illustrations. Current state-of-the-art video-to-video
translation models rely on having a video sequence or a single style image to
stylize an input video. We introduce a new problem for video stylizing where an
unordered set of images are used. This is a challenging task for two reasons:
i) we do not have the advantage of temporal consistency as in video sequences;
ii) it is more difficult to obtain consistent styles for video frames from a
set of unordered images compared to using a single image.
Most of the video-to-video translation methods are built on an image-to-image
translation model, and integrate additional networks such as optical flow, or
temporal predictors to capture temporal relations. These additional networks
make the model training and inference complicated and slow down the process. To
ensure temporal coherency in video-to-video style transfer, we propose a new
generator network with feature warping layers which overcomes the limitations
of the previous methods. We show the effectiveness of our method on three
datasets both qualitatively and quantitatively. Code and pretrained models are
available at https://github.com/giddyyupp/wait.
Related papers
- AniClipart: Clipart Animation with Text-to-Video Priors [28.76809141136148]
We introduce AniClipart, a system that transforms static images into high-quality motion sequences guided by text-to-video priors.
Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models.
arXiv Detail & Related papers (2024-04-18T17:24:28Z) - LoopAnimate: Loopable Salient Object Animation [19.761865029125524]
LoopAnimate is a novel method for generating videos with consistent start and end frames.
It achieves state-of-the-art performance in both objective metrics, such as fidelity and temporal consistency, and subjective evaluation results.
arXiv Detail & Related papers (2024-04-14T07:36:18Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction [93.26613503521664]
This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction.
We propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions.
Our model generates transition videos that ensure coherence and visual quality.
arXiv Detail & Related papers (2023-10-31T17:58:17Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style
Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video.
Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization.
Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z) - Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video
Generators [70.17041424896507]
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.
We propose a new task of zero-shot text-to-video generation using existing text-to-image synthesis methods.
Our method performs comparably or sometimes better than recent approaches, despite not being trained on additional video data.
arXiv Detail & Related papers (2023-03-23T17:01:59Z) - Show Me What and Tell Me How: Video Synthesis via Multimodal
Conditioning [36.85533835408882]
This work presents a multimodal video generation framework that benefits from text and images provided jointly or separately.
We propose a new video token trained with self-learning and an improved mask-prediction algorithm for sampling video tokens.
Our framework can incorporate various visual modalities, such as segmentation masks, drawings, and partially occluded images.
arXiv Detail & Related papers (2022-03-04T21:09:13Z) - Learning Long-Term Style-Preserving Blind Video Temporal Consistency [6.6908747077585105]
We propose a postprocessing model, to the transformation applied to videos, in the form of a recurrent neural network.
Our model is trained using a Ping Pong procedure and its corresponding loss, recently introduced for GAN video generation.
We evaluate our model on the DAVIS and videvo.net datasets and show that our approach offers state-of-the-art results concerning flicker removal.
arXiv Detail & Related papers (2021-03-12T13:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.