Animated Stickers: Bringing Stickers to Life with Video Diffusion
- URL: http://arxiv.org/abs/2402.06088v1
- Date: Thu, 8 Feb 2024 22:49:32 GMT
- Title: Animated Stickers: Bringing Stickers to Life with Video Diffusion
- Authors: David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang,
Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot
Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet,
Peter Vajda, Amy Bearman, Licheng Yu
- Abstract summary: We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static image.
Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.
- Score: 25.81904166775557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce animated stickers, a video diffusion model which generates an
animation conditioned on a text prompt and static sticker image. Our model is
built on top of the state-of-the-art Emu text-to-image model, with the addition
of temporal layers to model motion. Due to the domain gap, i.e. differences in
visual and motion style, a model which performed well on generating natural
videos can no longer generate vivid videos when applied to stickers. To bridge
this gap, we employ a two-stage finetuning pipeline: first with weakly
in-domain data, followed by human-in-the-loop (HITL) strategy which we term
ensemble-of-teachers. It distills the best qualities of multiple teachers into
a smaller student model. We show that this strategy allows us to specifically
target improvements to motion quality while maintaining the style from the
static image. With inference optimizations, our model is able to generate an
eight-frame video with high-quality, interesting, and relevant motion in under
one second.
Related papers
- UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation [44.220329202024494]
We present a few-shot-based tuning framework, LAMP, which enables text-to-image diffusion model Learn A specific Motion Pattern with 816 videos on a single GPU.
Specifically, we design a first-frame-conditioned pipeline that uses an off-the-shelf text-to-image model for content generation.
To capture the features of temporal dimension, we expand the pretrained 2D convolution layers of the T2I model to our novel temporal-spatial motion learning layers.
arXiv Detail & Related papers (2023-10-16T19:03:19Z) - AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning [92.33690050667475]
AnimateDiff is a framework for animating personalized T2I models without requiring model-specific tuning.
We propose MotionLoRA, a lightweight fine-tuning technique for AnimateDiff that enables a pre-trained motion module to adapt to new motion patterns.
Results show that our approaches help these models generate temporally smooth animation clips while preserving the visual quality and motion diversity.
arXiv Detail & Related papers (2023-07-10T17:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.