Related papers: Boosting Camera Motion Control for Video Diffusion Transformers

Boosting Camera Motion Control for Video Diffusion Transformers

URL: http://arxiv.org/abs/2410.10802v1
Date: Mon, 14 Oct 2024 17:58:07 GMT
Title: Boosting Camera Motion Control for Video Diffusion Transformers
Authors: Soon Yau Cheong, Duygu Ceylan, Armin Mustafa, Andrew Gilbert, Chun-Hao Paul Huang,
Abstract summary: We show that transformer-based diffusion models (DiT) suffer from severe degradation in camera motion accuracy. To address the persistent motion degradation in DiT, we introduce Camera Motion Guidance (CMG), which boosts camera control by over 400%. Our method universally applies to both U-Net and DiT models, offering improved camera control for video generation tasks.
Score: 21.151900688555624
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in diffusion models have significantly enhanced the quality of video generation. However, fine-grained control over camera pose remains a challenge. While U-Net-based models have shown promising results for camera control, transformer-based diffusion models (DiT)-the preferred architecture for large-scale video generation - suffer from severe degradation in camera motion accuracy. In this paper, we investigate the underlying causes of this issue and propose solutions tailored to DiT architectures. Our study reveals that camera control performance depends heavily on the choice of conditioning methods rather than camera pose representations that is commonly believed. To address the persistent motion degradation in DiT, we introduce Camera Motion Guidance (CMG), based on classifier-free guidance, which boosts camera control by over 400%. Additionally, we present a sparse camera control pipeline, significantly simplifying the process of specifying camera poses for long videos. Our method universally applies to both U-Net and DiT models, offering improved camera control for video generation tasks.

Related papers

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [85.10745006495364]
We present Uni3C, a unified framework for precise control of both camera and human motion in video generation. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters.
arXiv Detail & Related papers (2025-04-21T07:10:41Z)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation. We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z)
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength [11.778832811404259]
I2VControl-Camera is a novel camera control method that significantly enhances controllability while providing over the strength of subject motion. To accurately control and adjust the strength of subject motion, we explicitly model the higher-order components of the video trajectory expansion.
arXiv Detail & Related papers (2024-11-10T16:59:39Z)
CamI2V: Camera-Controlled Image-to-Video Diffusion Model [11.762824216082508]
In this paper, we emphasize the necessity of integrating explicit physical constraints into model design. Epipolar attention is proposed for modeling all cross-frame relationships from a novel perspective of noised condition. We achieve a 25.5% improvement in camera controllability on RealEstate10K while maintaining strong generalization to out-of-domain images.
arXiv Detail & Related papers (2024-10-21T12:36:27Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z)
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)
MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos. Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z)
CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation since it allows users to create desired content. Existing models largely overlooked the precise control of camera pose that serves as a cinematic language. We introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.