PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention
- URL: http://arxiv.org/abs/2511.17185v1
- Date: Fri, 21 Nov 2025 12:05:46 GMT
- Title: PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention
- Authors: Yipeng Chen, Zhichao Ye, Zhenzhou Fang, Xinyu Chen, Xiaoyu Zhang, Jialing Liu, Nan Wang, Haomin Liu, Guofeng Zhang,
- Abstract summary: PostCam is a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes.<n> Experiments on both real-world and synthetic datasets demonstrate that PostCam outperforms state-of-the-art methods by over 20% in camera control precision and view consistency.
- Score: 13.912161562631722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strategies; such suboptimal designs not only limit camera control precision but also result in generated videos that fail to preserve fine visual details from the source video. To achieve more accurate and flexible motion manipulation, PostCam introduces a query-shared cross-attention module. It integrates two distinct forms of control signals: the 6-DoF camera poses and the 2D rendered video frames. By fusing them into a unified representation within a shared feature space, our model can extract underlying motion cues, which enhances both control precision and generation quality. Furthermore, we adopt a two-stage training strategy: the model first learns coarse camera control from pose inputs, and then incorporates visual information to refine motion accuracy and enhance visual fidelity. Experiments on both real-world and synthetic datasets demonstrate that PostCam outperforms state-of-the-art methods by over 20% in camera control precision and view consistency, while achieving the highest video generation quality. Our project webpage is publicly available at: https://cccqaq.github.io/PostCam.github.io/
Related papers
- FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning [45.013802909442184]
We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input.<n> Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.
arXiv Detail & Related papers (2026-03-05T18:59:58Z) - Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z) - DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation [51.66285725139235]
We present DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation.<n>We propose a dual-branch framework that mutually generates camera-consistent RGB and depth sequences.<n>Experiments demonstrate that DualCamCtrl achieves more consistent camera-controlled video generation.
arXiv Detail & Related papers (2025-11-28T12:19:57Z) - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.<n>It reproduces the dynamic scene of an input video at novel camera trajectories.<n>Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z) - AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.<n>We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - Training-free Camera Control for Video Generation [15.79168688275606]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models.<n>Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.<n>It can be plug-and-play with most pretrained video diffusion models and generate camera-controllable videos with a single image or text prompt as input.
arXiv Detail & Related papers (2024-06-14T15:33:00Z) - CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation.
To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block.
Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z) - CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely.<n>Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances.<n>We introduce CameraCtrl, enabling accurate camera pose control for video diffusion models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.