Related papers: Controllable Longer Image Animation with Diffusion Models

Controllable Longer Image Animation with Diffusion Models

URL: http://arxiv.org/abs/2405.17306v2
Date: Tue, 28 May 2024 03:30:43 GMT
Title: Controllable Longer Image Animation with Diffusion Models
Authors: Qiang Wang, Minghua Liu, Junjun Hu, Fan Jiang, Mu Xu,
Abstract summary: We introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos. We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
Score: 12.565739255499594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating realistic animated videos from static images is an important area of research in computer vision. Methods based on physical simulation and motion prediction have achieved notable advances, but they are often limited to specific object textures and motion trajectories, failing to exhibit highly complex environments and physical dynamics. In this paper, we introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos and learning moving trajectories and strengths. Current pretrained video generation models are typically limited to producing very short videos, typically less than 30 frames. In contrast, we propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks, facilitating the creation of videos over 100 frames in length while maintaining consistency in content scenery and motion coordination. Specifically, we decompose the denoise process into two distinct phases: the shaping of scene contours and the refining of motion details. Then we reschedule the noise to control the generated frame sequences maintaining long-distance noise correlation. We conducted extensive experiments with 10 baselines, encompassing both commercial tools and academic methodologies, which demonstrate the superiority of our method. Our project page: https://wangqiang9.github.io/Controllable.github.io/

Related papers

DreamLoop: Controllable Cinemagraph Generation from a Single Photograph [15.908714882662823]
We present DreamLoop, a controllable video synthesis framework dedicated to generating cinemagraphs from a single photo.<n>Our key idea is to adapt a general video diffusion model by training it on two objectives: temporal bridging and motion conditioning.<n>We demonstrate that our method produces high-quality, complex cinemagraphs that align with user intent, outperforming existing approaches.
arXiv Detail & Related papers (2026-01-06T01:41:40Z)
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions [80.61838364885482]
Video generative models still struggle to animate static images into videos that portray delicate human actions.<n>In this paper, we explore the task of learning to animate images to portray delicate human actions using a small number of videos.<n>We propose FLASH, which learns generalizable motion patterns by forcing the model to reconstruct a video using the motion features and cross-frame correspondences of another video.
arXiv Detail & Related papers (2025-03-01T01:09:45Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
AnimateAnything: Consistent and Controllable Animation for Video Generation [24.576022028967195]
We present a unified controllable video generation approach AnimateAnything. It facilitates precise and consistent video manipulation across various conditions. Experiments demonstrate that our method outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2024-11-16T16:36:49Z)
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z)
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion.<n>Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input.<n>By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity.<n>Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks such as full-body and face reenactment.
arXiv Detail & Related papers (2024-08-01T10:55:20Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance [13.416296247896042]
We introduce an open domain image animation method that leverages the motion prior of video diffusion model. Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control of the movable area and its motion speed. We validate the effectiveness of our method through rigorous experiments on an open-domain dataset.
arXiv Detail & Related papers (2023-11-21T03:47:54Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)
Learning Fine-Grained Motion Embedding for Landscape Animation [140.57889994591494]
We propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding. To train and evaluate on diverse time-lapse videos, we build the largest high-resolution Time-lapse video dataset with Diverse scenes. Our method achieves relative improvements by 19% on LIPIS and 5.6% on FVD compared with state-of-the-art methods on our dataset.
arXiv Detail & Related papers (2021-09-06T02:47:11Z)
Animating Pictures with Eulerian Motion Fields [90.30598913855216]
We show a fully automatic method for converting a still image into a realistic animated looping video. We target scenes with continuous fluid motion, such as flowing water and billowing smoke. We propose a novel video looping technique that flows features both forward and backward in time and then blends the results.
arXiv Detail & Related papers (2020-11-30T18:59:06Z)
First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.