Related papers: C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

URL: http://arxiv.org/abs/2502.19868v1
Date: Thu, 27 Feb 2025 08:21:03 GMT
Title: C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation
Authors: Yuhao Li, Mirana Claire Angel, Salman Khan, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan,
Abstract summary: Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation.<n>We propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag.<n>Our method includes an object perception module and a Chain-of-Thought-based motion reasoning module.
Score: 81.4106601222722
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. However, the existing trajectory-based approaches are usually limited to only generating the motion trajectory of the controlled object and ignoring the dynamic interactions between the controlled object and its surroundings. To address this limitation, we propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Instead of directly generating the motion of some objects, our C-Drag first performs object perception and then reasons the dynamic interactions between different objects according to the given motion control of the objects. Specifically, our method includes an object perception module and a Chain-of-Thought-based motion reasoning module. The object perception module employs visual language models to capture the position and category information of various objects within the image. The Chain-of-Thought-based motion reasoning module takes this information as input and conducts a stage-wise reasoning process to generate motion trajectories for each of the affected objects, which are subsequently fed to the diffusion model for video synthesis. Furthermore, we introduce a new video object interaction (VOI) dataset to evaluate the generation quality of motion controlled video generation methods. Our VOI dataset contains three typical types of interactions and provides the motion trajectories of objects that can be used for accurate performance evaluation. Experimental results show that C-Drag achieves promising performance across multiple metrics, excelling in object motion control. Our benchmark, codes, and models will be available at https://github.com/WesLee88524/C-Drag-Official-Repo.

Related papers

MotionFlow:Learning Implicit Motion Flow for Complex Camera Trajectory Control in Video Generation [30.528654507198052]
We propose a novel approach that integrates both camera and object motions by converting them into the motion of corresponding pixels.<n>Our model outperforms SOTA methods by a large margin.
arXiv Detail & Related papers (2025-09-25T13:06:12Z)
ATI: Any Trajectory Instruction for Controllable Video Generation [25.249489701215467]
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion.<n>Our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models.
arXiv Detail & Related papers (2025-05-28T23:49:18Z)
Segment Any Motion in Videos [80.72424676419755]
We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support.
arXiv Detail & Related papers (2025-03-28T09:34:11Z)
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance [58.69427663452654]
We introduce MagicMotion, an image-to-video generation framework for trajectory-controllable video generation.<n>MagicMotion animates objects along defined trajectories while maintaining object consistency and visual quality.<n>We present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering.
arXiv Detail & Related papers (2025-03-20T17:59:42Z)
Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization [9.231848716070257]
ATOP (Articulate That Object Part) is a novel few-shot method based on motion personalization to articulate a static 3D object. We show that our method is capable of generating realistic motion videos and predicting 3D motion parameters in a more accurate and generalizable way.
arXiv Detail & Related papers (2025-02-11T05:47:16Z)
Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions [78.65431951506152]
We introduce a Synthetic dataset for Free-Form Motion Control (SynFMC)<n>The proposed SynFMC dataset includes diverse objects and environments and covers various motion patterns according to specific rules.<n>We further propose a method, Free-Form Motion Control (FMC), which enables independent or simultaneous control of object and camera movements.
arXiv Detail & Related papers (2025-01-02T18:59:45Z)
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z)
Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking. We propose MoMA-M3T, a framework that mainly consists of three motion-aware components. We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z)
Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input. Our trained model can generate unseen realistic object-to-object interactions. We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z)
MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z)
Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework. It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z)
Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation [93.22300146395536]
We design a computational architecture that discovers camouflaged objects in videos, specifically by exploiting motion information to perform object segmentation. We collect the first large-scale Moving Camouflaged Animals (MoCA) video dataset, which consists of over 140 clips across a diverse range of animals. We demonstrate the effectiveness of the proposed model on MoCA, and achieve competitive performance on the unsupervised segmentation protocol on DAVIS2016 by only relying on motion.
arXiv Detail & Related papers (2020-11-23T18:59:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.