DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
- URL: http://arxiv.org/abs/2310.12190v2
- Date: Mon, 27 Nov 2023 13:36:04 GMT
- Title: DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
- Authors: Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan
Liu, Xintao Wang, Tien-Tsin Wong, Ying Shan
- Abstract summary: We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
- Score: 63.43133768897087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Animating a still image offers an engaging visual experience. Traditional
image animation techniques mainly focus on animating natural scenes with
stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g.
human hair or body motions), and thus limits their applicability to more
general visual content. To overcome this limitation, we explore the synthesis
of dynamic content for open-domain images, converting them into animated
videos. The key idea is to utilize the motion prior of text-to-video diffusion
models by incorporating the image into the generative process as guidance.
Given an image, we first project it into a text-aligned rich context
representation space using a query transformer, which facilitates the video
model to digest the image content in a compatible fashion. However, some visual
details still struggle to be preserved in the resultant videos. To supplement
with more precise image information, we further feed the full image to the
diffusion model by concatenating it with the initial noises. Experimental
results show that our proposed method can produce visually convincing and more
logical & natural motions, as well as higher conformity to the input image.
Comparative evaluation demonstrates the notable superiority of our approach
over existing competitors.
Related papers
- Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation [15.569467643817447]
We introduce a technique that concurrently learns both foreground and background dynamics by segregating their movements using distinct motion representations.
We train on real-world videos enhanced with this innovative motion depiction approach.
To further extend video generation to longer sequences without accumulating errors, we adopt a clip-by-clip generation strategy.
arXiv Detail & Related papers (2024-05-26T00:53:26Z) - AniClipart: Clipart Animation with Text-to-Video Priors [28.76809141136148]
We introduce AniClipart, a system that transforms static images into high-quality motion sequences guided by text-to-video priors.
Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models.
arXiv Detail & Related papers (2024-04-18T17:24:28Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - ActAnywhere: Subject-Aware Video Background Generation [62.57759679425924]
Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community.
This task involves background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention.
We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts.
arXiv Detail & Related papers (2024-01-19T17:16:16Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - AnimateAnything: Fine-Grained Open Domain Image Animation with Motion
Guidance [13.416296247896042]
We introduce an open domain image animation method that leverages the motion prior of video diffusion model.
Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control of the movable area and its motion speed.
We validate the effectiveness of our method through rigorous experiments on an open-domain dataset.
arXiv Detail & Related papers (2023-11-21T03:47:54Z) - Regenerating Arbitrary Video Sequences with Distillation Path-Finding [6.687073794084539]
This paper presents an interactive framework to generate new sequences according to the users' preference on the starting frame.
To achieve this effectively, we first learn the feature correlation on the frameset of the given video through a proposed network called RSFNet.
Then, we develop a novel path-finding algorithm, SDPF, which formulates the knowledge of motion directions of the source video to estimate the smooth and plausible sequences.
arXiv Detail & Related papers (2023-11-13T09:05:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.