Follow-Your-Click: Open-domain Regional Image Animation via Short
Prompts
- URL: http://arxiv.org/abs/2403.08268v1
- Date: Wed, 13 Mar 2024 05:44:37 GMT
- Title: Follow-Your-Click: Open-domain Regional Image Animation via Short
Prompts
- Authors: Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei
Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, and Qifeng Chen
- Abstract summary: We propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click.
Our framework has simpler yet precise user control and better generation performance than previous methods.
- Score: 67.5094490054134
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite recent advances in image-to-video generation, better controllability
and local animation are less explored. Most existing image-to-video methods are
not locally aware and tend to move the entire scene. However, human artists may
need to control the movement of different objects or regions. Additionally,
current I2V methods require users not only to describe the target motion but
also to provide redundant detailed descriptions of frame contents. These two
issues hinder the practical utilization of current I2V tools. In this paper, we
propose a practical framework, named Follow-Your-Click, to achieve image
animation with a simple user click (for specifying what to move) and a short
motion prompt (for specifying how to move). Technically, we propose the
first-frame masking strategy, which significantly improves the video generation
quality, and a motion-augmented module equipped with a short motion prompt
dataset to improve the short prompt following abilities of our model. To
further control the motion speed, we propose flow-based motion magnitude
control to control the speed of target movement more precisely. Our framework
has simpler yet precise user control and better generation performance than
previous methods. Extensive experiments compared with 7 baselines, including
both commercial tools and research methods on 8 metrics, suggest the
superiority of our approach. Project Page: https://follow-your-click.github.io/
Related papers
- Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion.
Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input.
By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity.
Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks.
arXiv Detail & Related papers (2024-08-01T10:55:20Z) - MotionBooth: Motion-Aware Customized Text-to-Video Generation [44.41894050494623]
MotionBooth is a framework designed for animating customized subjects with precise control over both object and camera movements.
We efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately.
Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance.
arXiv Detail & Related papers (2024-06-25T17:42:25Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Controllable Longer Image Animation with Diffusion Models [12.565739255499594]
We introduce an open-domain controllable image animation method using motion priors with video diffusion models.
Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos.
We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
arXiv Detail & Related papers (2024-05-27T16:08:00Z) - Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements.
For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters.
Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z) - Motion-I2V: Consistent and Controllable Image-to-Video Generation with
Explicit Motion Modeling [62.19142543520805]
Motion-I2V is a framework for consistent and controllable image-to-video generation.
It factorizes I2V into two stages with explicit motion modeling.
Motion-I2V's second stage naturally supports zero-shot video-to-video translation.
arXiv Detail & Related papers (2024-01-29T09:06:43Z) - LivePhoto: Real Image Animation with Text-guided Motion Control [51.31418077586208]
This work presents a practical system, named LivePhoto, which allows users to animate an image of their interest with text descriptions.
We first establish a strong baseline that helps a well-learned text-to-image generator (i.e., Stable Diffusion) take an image as a further input.
We then equip the improved generator with a motion module for temporal modeling and propose a carefully designed training pipeline to better link texts and motions.
arXiv Detail & Related papers (2023-12-05T17:59:52Z) - MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video
Generation [131.1446077627191]
Zero-shot Text-to-Video synthesis generates videos based on prompts without any videos.
We propose a prompt-adaptive and disentangled motion control strategy coined as MotionZero.
Our strategy could correctly control motion of different objects and support versatile applications including zero-shot video edit.
arXiv Detail & Related papers (2023-11-28T09:38:45Z) - AnimateAnything: Fine-Grained Open Domain Image Animation with Motion
Guidance [13.416296247896042]
We introduce an open domain image animation method that leverages the motion prior of video diffusion model.
Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control of the movable area and its motion speed.
We validate the effectiveness of our method through rigorous experiments on an open-domain dataset.
arXiv Detail & Related papers (2023-11-21T03:47:54Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.