DragAnything: Motion Control for Anything using Entity Representation
- URL: http://arxiv.org/abs/2403.07420v3
- Date: Fri, 15 Mar 2024 05:53:11 GMT
- Title: DragAnything: Motion Control for Anything using Entity Representation
- Authors: Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang,
- Abstract summary: DragAnything achieves motion control for any object in controllable video generation.
Our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.
- Score: 32.2017791506088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation. Comparison to existing motion control methods, DragAnything offers several advantages. Firstly, trajectory-based is more userfriendly for interaction, when acquiring other guidance signals (e.g., masks, depth maps) is labor-intensive. Users only need to draw a line (trajectory) during interaction. Secondly, our entity representation serves as an open-domain embedding capable of representing any object, enabling the control of motion for diverse entities, including background. Lastly, our entity representation allows simultaneous and distinct motion control for multiple objects. Extensive experiments demonstrate that our DragAnything achieves state-of-the-art performance for FVD, FID, and User Study, particularly in terms of object motion control, where our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.
Related papers
- Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation [21.87745390965703]
We introduce 3D-aware motion representation and propose an image animation framework, called Perception-as-Control, to achieve fine-grained collaborative motion control.
Specifically, we construct 3D-aware motion representation from a reference image, manipulate it based on interpreted user intentions, and perceive it from different viewpoints.
In this way, camera and object motions are transformed into intuitive, consistent visual changes.
arXiv Detail & Related papers (2025-01-09T07:23:48Z) - Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions [78.65431951506152]
We introduce a Synthetic dataset for Free-Form Motion Control (SynFMC)
The proposed SynFMC dataset includes diverse objects and environments and covers various motion patterns according to specific rules.
We further propose a method, Free-Form Motion Control (FMC), which enables independent or simultaneous control of object and camera movements.
arXiv Detail & Related papers (2025-01-02T18:59:45Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.
We translate high-level user requests into detailed, semi-dense motion prompts.
We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships [16.501613834154746]
DragEntity is a video generation model that utilizes entity representation for controlling the motion of multiple objects.
Our experiments validate the effectiveness of DragEntity, demonstrating its excellent performance in fine-grained control in video generation.
arXiv Detail & Related papers (2024-10-14T17:24:35Z) - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals.
We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline.
After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z) - DragAPart: Learning a Part-Level Motion Prior for Articulated Objects [67.97235923372035]
We introduce DragAPart, a method that generates a new image of the same object that responds to the action of the drags.
Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
arXiv Detail & Related papers (2024-03-22T17:58:59Z) - Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements.
For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters.
Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z) - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation [77.09621778348733]
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement.
This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
arXiv Detail & Related papers (2023-12-06T17:49:57Z) - Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold [79.94300820221996]
DragGAN is a new way of controlling generative adversarial networks (GANs)
DragGAN allows anyone to deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc.
Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking.
arXiv Detail & Related papers (2023-05-18T13:41:25Z) - Treating Motion as Option to Reduce Motion Dependency in Unsupervised
Video Object Segmentation [5.231219025536678]
Unsupervised video object segmentation (VOS) aims to detect the most salient object in a video sequence at the pixel level.
Most state-of-the-art methods leverage motion cues obtained from optical flow maps in addition to appearance cues to exploit the property that salient objects usually have distinctive movements compared to the background.
arXiv Detail & Related papers (2022-09-04T18:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.