Related papers: AniClipart: Clipart Animation with Text-to-Video Priors

AniClipart: Clipart Animation with Text-to-Video Priors

URL: http://arxiv.org/abs/2404.12347v1
Date: Thu, 18 Apr 2024 17:24:28 GMT
Title: AniClipart: Clipart Animation with Text-to-Video Priors
Authors: Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao,
Abstract summary: We introduce AniClipart, a system that transforms static images into high-quality motion sequences guided by text-to-video priors. Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models.
Score: 28.76809141136148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nevertheless, direct application of text-to-video generation models often struggles to retain the visual identity of clipart images or generate cartoon-style motions, resulting in unsatisfactory animation outcomes. In this paper, we introduce AniClipart, a system that transforms static clipart images into high-quality motion sequences guided by text-to-video priors. To generate cartoon-style and smooth motion, we first define B\'{e}zier curves over keypoints of the clipart image as a form of motion regularization. We then align the motion trajectories of the keypoints with the provided text prompt by optimizing the Video Score Distillation Sampling (VSDS) loss, which encodes adequate knowledge of natural motion within a pretrained text-to-video diffusion model. With a differentiable As-Rigid-As-Possible shape deformation algorithm, our method can be end-to-end optimized while maintaining deformation rigidity. Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models, in terms of text-video alignment, visual identity preservation, and motion consistency. Furthermore, we showcase the versatility of AniClipart by adapting it to generate a broader array of animation formats, such as layered animation, which allows topological changes.

Related papers

FlexiClip: Locality-Preserving Free-Form Character Animation [14.50214193838818]
Existing methods, such as AniClipart, effectively model geometric deformations but often fail to ensure smooth temporal transitions. This paper introduces FlexiClip, a novel approach designed to overcome these limitations by addressing the intertwined challenges of temporal consistency and geometric integrity.
arXiv Detail & Related papers (2025-01-15T09:07:12Z)
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations [65.64014682930164]
Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions. We present FlipSketch, a system that brings back the magic of flip-book animation -- just draw your idea and describe how you want it to move!
arXiv Detail & Related papers (2024-11-16T14:53:03Z)
Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation [19.408715783816167]
We introduce a training-free framework that ensures the generated video sequence preserves the reference image's subtleties. We decouple skeletal and motion priors from pose information, enabling precise control over animation generation. Our method significantly enhances the quality of video generation without the need for large datasets or expensive computational resources.
arXiv Detail & Related papers (2024-08-29T13:08:12Z)
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography" It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts. Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z)
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff. For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation. For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. In this paper, we propose a novel framework tailored for character animation. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z)
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance [13.416296247896042]
We introduce an open domain image animation method that leverages the motion prior of video diffusion model. Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control of the movable area and its motion speed. We validate the effectiveness of our method through rigorous experiments on an open-domain dataset.
arXiv Detail & Related papers (2023-11-21T03:47:54Z)
Regenerating Arbitrary Video Sequences with Distillation Path-Finding [6.687073794084539]
This paper presents an interactive framework to generate new sequences according to the users' preference on the starting frame. To achieve this effectively, we first learn the feature correlation on the frameset of the given video through a proposed network called RSFNet. Then, we develop a novel path-finding algorithm, SDPF, which formulates the knowledge of motion directions of the source video to estimate the smooth and plausible sequences.
arXiv Detail & Related papers (2023-11-13T09:05:30Z)
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos. The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance. Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z)
Deep Animation Video Interpolation in the Wild [115.24454577119432]
In this work, we formally define and study the animation video code problem for the first time. We propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner. Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild.
arXiv Detail & Related papers (2021-04-06T13:26:49Z)
Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.