CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
- URL: http://arxiv.org/abs/2503.10592v1
- Date: Thu, 13 Mar 2025 17:42:01 GMT
- Title: CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
- Authors: Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li,
- Abstract summary: CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
- Score: 89.63787060844409
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamics and limited range of viewpoints when generating videos with large camera movement. We take an approach that progressively expands the generation of dynamic scenes -- first enhancing dynamic content within individual video clip, then extending this capability to create seamless explorations across broad viewpoint ranges. Specifically, we construct a dataset featuring a large degree of dynamics with camera parameter annotations for training while designing a lightweight camera injection module and training scheme to preserve dynamics of the pretrained models. Building on these improved single-clip techniques, we enable extended scene exploration by allowing users to iteratively specify camera trajectories for generating coherent video sequences. Experiments across diverse scenarios demonstrate that CameraCtrl Ii enables camera-controlled dynamic scene synthesis with substantially wider spatial exploration than previous approaches.
Related papers
- Dynamic Camera Poses and Where to Find Them [36.249380390918816]
We introduce DynPose-100K, a large-scale dataset of dynamic Internet videos annotated with camera poses.
For pose estimation, we combine the latest techniques of point tracking, dynamic masking, and structure-from-motion.
Our analysis and experiments demonstrate that DynPose-100K is both large-scale and diverse across several key attributes.
arXiv Detail & Related papers (2025-04-24T17:59:56Z) - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.
It reproduces the dynamic scene of an input video at novel camera trajectories.
Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z) - SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation.<n>We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints.<n>Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates.
Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - Training-free Camera Control for Video Generation [15.79168688275606]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models.<n>Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.<n>It can be plug-and-play with most pretrained video diffusion models and generate camera-controllable videos with a single image or text prompt as input.
arXiv Detail & Related papers (2024-06-14T15:33:00Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z) - MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos.
Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z) - CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely.
Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances.
We introduce CameraCtrl, enabling accurate camera pose control for video diffusion models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.