Related papers: DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships

DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships

URL: http://arxiv.org/abs/2410.10751v1
Date: Mon, 14 Oct 2024 17:24:35 GMT
Title: DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships
Authors: Zhang Wan, Sheng Tang, Jiawei Wei, Ruize Zhang, Juan Cao,
Abstract summary: DragEntity is a video generation model that utilizes entity representation for controlling the motion of multiple objects. Our experiments validate the effectiveness of DragEntity, demonstrating its excellent performance in fine-grained control in video generation.
Score: 16.501613834154746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, diffusion models have achieved tremendous success in the field of video generation, with controllable video generation receiving significant attention. However, existing control methods still face two limitations: Firstly, control conditions (such as depth maps, 3D Mesh) are difficult for ordinary users to obtain directly. Secondly, it's challenging to drive multiple objects through complex motions with multiple trajectories simultaneously. In this paper, we introduce DragEntity, a video generation model that utilizes entity representation for controlling the motion of multiple objects. Compared to previous methods, DragEntity offers two main advantages: 1) Our method is more user-friendly for interaction because it allows users to drag entities within the image rather than individual pixels. 2) We use entity representation to represent any object in the image, and multiple objects can maintain relative spatial relationships. Therefore, we allow multiple trajectories to control multiple objects in the image with different levels of complexity simultaneously. Our experiments validate the effectiveness of DragEntity, demonstrating its excellent performance in fine-grained control in video generation.

Related papers

LayerT2V: Interactive Multi-Object Trajectory Layering for Video Generation [33.26383352897258]
Controlling object motion trajectories in Text-to-Video (T2V) generation is a challenging and relatively under-explored area.<n>We introduce LayerT2V, the first approach for generating video by compositing background and foreground objects layer by layer.<n>Experiments demonstrate the superiority of LayerT2V in generating complex multi-object scenarios, showcasing 1.4x and 4.5x improvements in mIoU and AP50 metrics over state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2025-08-06T09:03:16Z)
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance [46.92591065065018]
We introduce MagicMotion, an image-to-video generation framework for trajectory-controllable video generation. MagicMotion animates objects along defined trajectories while maintaining object consistency and visual quality. We present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering.
arXiv Detail & Related papers (2025-03-20T17:59:42Z)
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation [81.4106601222722]
Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. We propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Our method includes an object perception module and a Chain-of-Thought-based motion reasoning module.
arXiv Detail & Related papers (2025-02-27T08:21:03Z)
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation [62.64811405314847]
VidCRAFT3 is a novel framework for precise image-to-video generation. It enables control over camera motion, object motion, and lighting direction simultaneously. It produces high-quality video content, outperforming state-of-the-art methods in control granularity and visual coherence.
arXiv Detail & Related papers (2025-02-11T13:11:59Z)
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis [80.2461057573121]
In this work, we augment the interaction with a new dimension, i.e., the depth dimension, such that users are allowed to assign a relative depth for each point on the trajectory. We propose a pioneering method for 3D trajectory control in image-to-video by abstracting object masks into a few cluster points. Experiments validate the effectiveness of our approach, dubbed LeviTor, in precisely manipulating the object movements when producing photo-realistic videos from static images.
arXiv Detail & Related papers (2024-12-19T18:59:56Z)
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation [83.98251722144195]
Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions. We introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space. We show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions.
arXiv Detail & Related papers (2024-12-10T18:55:13Z)
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals. We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z)
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects [67.97235923372035]
We introduce DragAPart, a method that generates a new image of the same object that responds to the action of the drags. Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.
arXiv Detail & Related papers (2024-03-22T17:58:59Z)
DragAnything: Motion Control for Anything using Entity Representation [32.2017791506088]
DragAnything achieves motion control for any object in controllable video generation. Our method surpasses the previous methods (e.g., DragNUWA) by 26% in human voting.
arXiv Detail & Related papers (2024-03-12T08:57:29Z)
CAGE: Controllable Articulation GEneration [14.002289666443529]
We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method. Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters. Our experiments show that our method outperforms the state-of-the-art in articulated object generation.
arXiv Detail & Related papers (2023-12-15T07:04:27Z)
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory [126.4597063554213]
DragNUWA is an open-domain diffusion-based video generation model. It provides fine-grained control over video content from semantic, spatial, and temporal perspectives. Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.
arXiv Detail & Related papers (2023-08-16T01:43:41Z)
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images. We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z)
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold [79.94300820221996]
DragGAN is a new way of controlling generative adversarial networks (GANs) DragGAN allows anyone to deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking.
arXiv Detail & Related papers (2023-05-18T13:41:25Z)
Xp-GAN: Unsupervised Multi-object Controllable Video Generation [8.807587076209566]
Video Generation is a relatively new and yet popular subject in machine learning. Current methods in Video Generation provide the user with little or no control over the exact specification of how the objects in the generate video are to be moved. We propose a novel method that allows the user to move any number of objects of a single initial frame just by drawing bounding boxes over those objects and then moving those boxes in the desired path.
arXiv Detail & Related papers (2021-11-19T14:10:50Z)
Video Exploration via Video-Specific Autoencoders [60.256055890647595]
We present video-specific autoencoders that enables human-controllable video exploration. We observe that a simple autoencoder trained on multiple frames of a specific video enables one to perform a large variety of video processing and editing tasks.
arXiv Detail & Related papers (2021-03-31T17:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.