DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships
- URL: http://arxiv.org/abs/2410.10751v1
- Date: Mon, 14 Oct 2024 17:24:35 GMT
- Title: DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships
- Authors: Zhang Wan, Sheng Tang, Jiawei Wei, Ruize Zhang, Juan Cao,
- Abstract summary: DragEntity is a video generation model that utilizes entity representation for controlling the motion of multiple objects.
Our experiments validate the effectiveness of DragEntity, demonstrating its excellent performance in fine-grained control in video generation.
- Score: 16.501613834154746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, diffusion models have achieved tremendous success in the field of video generation, with controllable video generation receiving significant attention. However, existing control methods still face two limitations: Firstly, control conditions (such as depth maps, 3D Mesh) are difficult for ordinary users to obtain directly. Secondly, it's challenging to drive multiple objects through complex motions with multiple trajectories simultaneously. In this paper, we introduce DragEntity, a video generation model that utilizes entity representation for controlling the motion of multiple objects. Compared to previous methods, DragEntity offers two main advantages: 1) Our method is more user-friendly for interaction because it allows users to drag entities within the image rather than individual pixels. 2) We use entity representation to represent any object in the image, and multiple objects can maintain relative spatial relationships. Therefore, we allow multiple trajectories to control multiple objects in the image with different levels of complexity simultaneously. Our experiments validate the effectiveness of DragEntity, demonstrating its excellent performance in fine-grained control in video generation.
Related papers
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals.
We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline.
After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z) - DragAPart: Learning a Part-Level Motion Prior for Articulated Objects [67.97235923372035]
We introduce DragAPart, a method that generates a new image of the same object that responds to the action of the drags.
Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
arXiv Detail & Related papers (2024-03-22T17:58:59Z) - DragAnything: Motion Control for Anything using Entity Representation [32.2017791506088]
DragAnything achieves motion control for any object in controllable video generation.
Our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.
arXiv Detail & Related papers (2024-03-12T08:57:29Z) - CAGE: Controllable Articulation GEneration [14.002289666443529]
We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method.
Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters.
Our experiments show that our method outperforms the state-of-the-art in articulated object generation.
arXiv Detail & Related papers (2023-12-15T07:04:27Z) - DragNUWA: Fine-grained Control in Video Generation by Integrating Text,
Image, and Trajectory [126.4597063554213]
DragNUWA is an open-domain diffusion-based video generation model.
It provides fine-grained control over video content from semantic, spatial, and temporal perspectives.
Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.
arXiv Detail & Related papers (2023-08-16T01:43:41Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold [79.94300820221996]
DragGAN is a new way of controlling generative adversarial networks (GANs)
DragGAN allows anyone to deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc.
Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking.
arXiv Detail & Related papers (2023-05-18T13:41:25Z) - Xp-GAN: Unsupervised Multi-object Controllable Video Generation [8.807587076209566]
Video Generation is a relatively new and yet popular subject in machine learning.
Current methods in Video Generation provide the user with little or no control over the exact specification of how the objects in the generate video are to be moved.
We propose a novel method that allows the user to move any number of objects of a single initial frame just by drawing bounding boxes over those objects and then moving those boxes in the desired path.
arXiv Detail & Related papers (2021-11-19T14:10:50Z) - Video Exploration via Video-Specific Autoencoders [60.256055890647595]
We present video-specific autoencoders that enables human-controllable video exploration.
We observe that a simple autoencoder trained on multiple frames of a specific video enables one to perform a large variety of video processing and editing tasks.
arXiv Detail & Related papers (2021-03-31T17:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.