DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
- URL: http://arxiv.org/abs/2511.23127v2
- Date: Mon, 01 Dec 2025 18:39:32 GMT
- Title: DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation
- Authors: Hongfei Zhang, Kanghao Chen, Zixin Zhang, Harold Haodong Chen, Yuanhuiyi Lyu, Yuqi Zhang, Shuai Yang, Kun Zhou, Yingcong Chen,
- Abstract summary: We present DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation.<n>We propose a dual-branch framework that mutually generates camera-consistent RGB and depth sequences.<n>Experiments demonstrate that DualCamCtrl achieves more consistent camera-controlled video generation.
- Score: 51.66285725139235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene understanding and geometric awareness. DualCamCtrl specifically targets this limitation by introducing a dual-branch framework that mutually generates camera-consistent RGB and depth sequences. To harmonize these two modalities, we further propose the Semantic Guided Mutual Alignment (SIGMA) mechanism, which performs RGB-depth fusion in a semantics-guided and mutually reinforced manner. These designs collectively enable DualCamCtrl to better disentangle appearance and geometry modeling, generating videos that more faithfully adhere to the specified camera trajectories. Additionally, we analyze and reveal the distinct influence of depth and camera poses across denoising stages and further demonstrate that early and late stages play complementary roles in forming global structure and refining local details. Extensive experiments demonstrate that DualCamCtrl achieves more consistent camera-controlled video generation, with over 40\% reduction in camera motion errors compared with prior methods. Our project page: https://soyouthinkyoucantell.github.io/dualcamctrl-page/
Related papers
- CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization [32.42754288735215]
CETCAM is a camera-controllable video generation framework.<n>It eliminates the need for camera annotations through a consistent and tokenization scheme.<n>It learns robust camera controllability from diverse raw video data and refines fine-grained visual quality using high-fidelity datasets.
arXiv Detail & Related papers (2025-12-22T04:21:39Z) - Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z) - PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention [13.912161562631722]
PostCam is a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes.<n> Experiments on both real-world and synthetic datasets demonstrate that PostCam outperforms state-of-the-art methods by over 20% in camera control precision and view consistency.
arXiv Detail & Related papers (2025-11-21T12:05:46Z) - IDCNet: Guided Video Diffusion for Metric-Consistent RGBD Scene Generation with Precise Camera Control [11.830304371371968]
IDC-Net is a novel framework designed to generate RGB-D video sequences under explicit camera trajectory control.<n>We show that IDC-Net achieves improvements over state-of-the-art approaches in both visual quality and geometric consistency of generated scene sequences.
arXiv Detail & Related papers (2025-08-06T07:19:16Z) - Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [73.73984727616198]
We present Uni3C, a unified framework for precise control of both camera and human motion in video generation.<n>First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController.<n>Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters.
arXiv Detail & Related papers (2025-04-21T07:10:41Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control [12.80337762745075]
RealCam-I2V is a novel video generation framework that integrates monocular metric depth estimation.<n>During training, the reconstructed 3D scene enables scaling camera parameters from relative to metric scales.<n>RealCam-I2V achieves significant improvements in controllability and video quality on the RealEstate10K and out-of-domain images.
arXiv Detail & Related papers (2025-02-14T10:21:49Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z) - CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely.<n>Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances.<n>We introduce CameraCtrl, enabling accurate camera pose control for video diffusion models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.