Related papers: CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

URL: http://arxiv.org/abs/2406.02509v1
Date: Tue, 4 Jun 2024 17:27:19 GMT
Title: CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Authors: Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat,
Abstract summary: We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
Score: 117.16677556874278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video generation, limiting the expression of cinematic language and user control. To address this issue, we introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. We equip a pre-trained image-to-video generator with accurately parameterized camera pose input using Pl\"ucker coordinates. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block that enforces epipolar constraints to the feature maps. Additionally, we fine-tune CamCo on real-world videos with camera poses estimated through structure-from-motion algorithms to better synthesize object motion. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models while effectively generating plausible object motion. Project page: https://ir1d.github.io/CamCo/

Related papers

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control [88.90505842498823]
We present GEN3C, a generative video model with precise Camera Control and temporal 3D Consistency. Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis.
arXiv Detail & Related papers (2025-03-05T18:59:50Z)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation. We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
CamI2V: Camera-Controlled Image-to-Video Diffusion Model [11.762824216082508]
In this paper, we emphasize the necessity of integrating explicit physical constraints into model design. Epipolar attention is proposed for modeling all cross-frame relationships from a novel perspective of noised condition. We achieve a 25.5% improvement in camera controllability on RealEstate10K while maintaining strong generalization to out-of-domain images.
arXiv Detail & Related papers (2024-10-21T12:36:27Z)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation. Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency. Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)
CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation since it allows users to create desired content. Existing models largely overlooked the precise control of camera pose that serves as a cinematic language. We introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.