Related papers: Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

URL: http://arxiv.org/abs/2512.17040v1
Date: Thu, 18 Dec 2025 20:03:05 GMT
Title: Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation
Authors: Min-Jung Kim, Jeongho Kim, Hoiyeong Jin, Junha Hyung, Jaegul Choo,
Abstract summary: InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
Score: 49.12018869332346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in video diffusion models has spurred growing interest in camera-controlled novel-view video generation for dynamic scenes, aiming to provide creators with cinematic camera control capabilities in post-production. A key challenge in camera-controlled video generation is ensuring fidelity to the specified camera pose, while maintaining view consistency and reasoning about occluded geometry from limited observations. To address this, existing methods either train trajectory-conditioned video generation model on trajectory-video pair dataset, or estimate depth from the input video to reproject it along a target trajectory and generate the unprojected regions. Nevertheless, existing methods struggle to generate camera-pose-faithful, high-quality videos for two main reasons: (1) reprojection-based approaches are highly susceptible to errors caused by inaccurate depth estimation; and (2) the limited diversity of camera trajectories in existing datasets restricts learned models. To address these limitations, we present InfCam, a depth-free, camera-controlled video-to-video generation framework with high pose fidelity. The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model. Conditioning on this noise-free rotational information, the residual parallax term is predicted through end-to-end training to achieve high camera-pose fidelity; and (2) a data augmentation pipeline that transforms existing synthetic multiview datasets into sequences with diverse trajectories and focal lengths. Experimental results demonstrate that InfCam outperforms baseline methods in camera-pose accuracy and visual fidelity, generalizing well from synthetic to real-world data. Link to our project page:https://emjay73.github.io/InfCam/

Related papers

FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning [45.013802909442184]
We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input.<n> Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.
arXiv Detail & Related papers (2026-03-05T18:59:58Z)
Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation [21.084121261693365]
We propose DepthDirector, a video re-rendering framework with precise camera controllability.<n>By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories.
arXiv Detail & Related papers (2026-01-15T09:26:45Z)
CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization [32.42754288735215]
CETCAM is a camera-controllable video generation framework.<n>It eliminates the need for camera annotations through a consistent and tokenization scheme.<n>It learns robust camera controllability from diverse raw video data and refines fine-grained visual quality using high-fidelity datasets.
arXiv Detail & Related papers (2025-12-22T04:21:39Z)
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation [51.66285725139235]
We present DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation.<n>We propose a dual-branch framework that mutually generates camera-consistent RGB and depth sequences.<n>Experiments demonstrate that DualCamCtrl achieves more consistent camera-controlled video generation.
arXiv Detail & Related papers (2025-11-28T12:19:57Z)
DreamJourney: Perpetual View Generation with Video Diffusion Models [91.88716097573206]
Perpetual view generation aims to synthesize a long-term video corresponding to an arbitrary camera trajectory solely from a single input image.<n>Recent methods commonly utilize a pre-trained text-to-image diffusion model to synthesize new content of previously unseen regions along camera movement.<n>We present DreamJourney, a two-stage framework that leverages the world simulation capacity of video diffusion models to trigger a new perpetual scene view generation task.
arXiv Detail & Related papers (2025-06-21T12:51:34Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.<n>It reproduces the dynamic scene of an input video at novel camera trajectories.<n>Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z)
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control [12.80337762745075]
RealCam-I2V is a novel video generation framework that integrates monocular metric depth estimation.<n>During training, the reconstructed 3D scene enables scaling camera parameters from relative to metric scales.<n>RealCam-I2V achieves significant improvements in controllability and video quality on the RealEstate10K and out-of-domain images.
arXiv Detail & Related papers (2025-02-14T10:21:49Z)
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.