Trajectory Attention for Fine-grained Video Motion Control
- URL: http://arxiv.org/abs/2411.19324v1
- Date: Thu, 28 Nov 2024 18:59:51 GMT
- Title: Trajectory Attention for Fine-grained Video Motion Control
- Authors: Zeqi Xiao, Wenqi Ouyang, Yifan Zhou, Shuai Yang, Lei Yang, Jianlou Si, Xingang Pan,
- Abstract summary: This paper introduces trajectory attention, a novel approach that performs attention along available pixel trajectories for fine-grained camera motion control.
We show that our approach can be extended to other video motion control tasks, such as first-frame-guided video editing.
- Score: 20.998809534747767
- License:
- Abstract: Recent advancements in video generation have been greatly driven by video diffusion models, with camera motion control emerging as a crucial challenge in creating view-customized visual content. This paper introduces trajectory attention, a novel approach that performs attention along available pixel trajectories for fine-grained camera motion control. Unlike existing methods that often yield imprecise outputs or neglect temporal correlations, our approach possesses a stronger inductive bias that seamlessly injects trajectory information into the video generation process. Importantly, our approach models trajectory attention as an auxiliary branch alongside traditional temporal attention. This design enables the original temporal attention and the trajectory attention to work in synergy, ensuring both precise motion control and new content generation capability, which is critical when the trajectory is only partially available. Experiments on camera motion control for images and videos demonstrate significant improvements in precision and long-range consistency while maintaining high-quality generation. Furthermore, we show that our approach can be extended to other video motion control tasks, such as first-frame-guided video editing, where it excels in maintaining content consistency over large spatial and temporal ranges.
Related papers
- Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss [35.69606926024434]
We propose a simple yet effective solution that combines an initial-noise-based approach with a novel motion consistency loss.
We then design a motion consistency loss to maintain similar feature correlation patterns in the generated video.
This approach improves temporal consistency across various motion control tasks while preserving the benefits of a training-free setup.
arXiv Detail & Related papers (2025-01-13T18:53:08Z) - VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control [66.66226299852559]
VideoAnydoor is a zero-shot video object insertion framework with high-fidelity detail preservation and precise motion control.
To preserve the detailed appearance and meanwhile support fine-grained motion control, we design a pixel warper.
arXiv Detail & Related papers (2025-01-02T18:59:54Z) - SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation [22.693060144042196]
Methods for image-to-video generation have achieved impressive, photo-realistic quality.
adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error.
We introduce a framework for controllable image-to-video generation that is self-guided.
Our zero-shot method outperforms unsupervised baselines while narrowing down the performance gap with supervised models.
arXiv Detail & Related papers (2024-11-07T18:56:11Z) - Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models [64.2445487645478]
Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio.
We present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live streaming video translation.
arXiv Detail & Related papers (2024-07-11T17:34:51Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation [10.5019872575418]
We propose a novel zero-shot moving object trajectory control framework, Motion-Zero, to enable a bounding-box-trajectories-controlled text-to-video diffusion model.
Our method can be flexibly applied to various state-of-the-art video diffusion models without any training process.
arXiv Detail & Related papers (2024-01-18T17:22:37Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - Traffic Video Object Detection using Motion Prior [16.63738085066699]
We propose two innovative methods to exploit the motion prior and boost the performance of traffic video object detection.
Firstly, we introduce a new self-attention module that leverages the motion prior to guide temporal information integration.
Secondly, we utilise a pseudo-labelling mechanism to eliminate noisy pseudo labels for the semi-supervised setting.
arXiv Detail & Related papers (2023-11-16T18:59:46Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.