Related papers: CameraCtrl: Enabling Camera Control for Text-to-Video Generation

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

URL: http://arxiv.org/abs/2404.02101v1
Date: Tue, 2 Apr 2024 16:52:41 GMT
Title: CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang,
Abstract summary: Controllability plays a crucial role in video generation since it allows users to create desired content. Existing models largely overlooked the precise control of camera pose that serves as a cinematic language. We introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models.
Score: 86.36135895375425
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely parameterizing the camera trajectory, a plug-and-play camera module is then trained on a T2V model, leaving others untouched. Additionally, a comprehensive study on the effect of various datasets is also conducted, suggesting that videos with diverse camera distribution and similar appearances indeed enhance controllability and generalization. Experimental results demonstrate the effectiveness of CameraCtrl in achieving precise and domain-adaptive camera control, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs. Our project website is at: https://hehao13.github.io/projects-CameraCtrl/.

Related papers

CamCloneMaster: Enabling Reference-based Camera Control for Video Generation [39.68297612349062]
CamCloneMaster is a framework that enables users to replicate camera movements from reference videos without requiring camera parameters or test-time fine-tuning.<n>We present a large-scale synthetic dataset designed for camera clone learning, encompassing diverse scenes, subjects, and camera movements.
arXiv Detail & Related papers (2025-06-03T17:59:02Z)
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z)
Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training [51.851390459940646]
We introduce Latent-Reframe, which enables camera control in a pre-trained video diffusion model without fine-tuning. Latent-Reframe operates during the sampling stage, maintaining efficiency while preserving the original model distribution. Our approach reframes the latent code of video frames to align with the input camera trajectory through time-aware point clouds.
arXiv Detail & Related papers (2024-12-08T18:59:54Z)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation. We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z)
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength [11.778832811404259]
I2VControl-Camera is a novel camera control method that significantly enhances controllability while providing over the strength of subject motion. To accurately control and adjust the strength of subject motion, we explicitly model the higher-order components of the video trajectory expansion.
arXiv Detail & Related papers (2024-11-10T16:59:39Z)
Boosting Camera Motion Control for Video Diffusion Transformers [21.151900688555624]
We show that transformer-based diffusion models (DiT) suffer from severe degradation in camera motion accuracy. To address the persistent motion degradation in DiT, we introduce Camera Motion Guidance (CMG), which boosts camera control by over 400%. Our method universally applies to both U-Net and DiT models, offering improved camera control for video generation tasks.
arXiv Detail & Related papers (2024-10-14T17:58:07Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z)
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)
MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos. Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation [77.09621778348733]
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
arXiv Detail & Related papers (2023-12-06T17:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.