GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
- URL: http://arxiv.org/abs/2503.03751v1
- Date: Wed, 05 Mar 2025 18:59:50 GMT
- Title: GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
- Authors: Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao,
- Abstract summary: We present GEN3C, a generative video model with precise Camera Control and temporal 3D Consistency.<n>Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis.
- Score: 88.90505842498823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present GEN3C, a generative video model with precise Camera Control and temporal 3D Consistency. Prior video models already generate realistic videos, but they tend to leverage little 3D information, leading to inconsistencies, such as objects popping in and out of existence. Camera control, if implemented at all, is imprecise, because camera parameters are mere inputs to the neural network which must then infer how the video depends on the camera. In contrast, GEN3C is guided by a 3D cache: point clouds obtained by predicting the pixel-wise depth of seed images or previously generated frames. When generating the next frames, GEN3C is conditioned on the 2D renderings of the 3D cache with the new camera trajectory provided by the user. Crucially, this means that GEN3C neither has to remember what it previously generated nor does it have to infer the image structure from the camera pose. The model, instead, can focus all its generative power on previously unobserved regions, as well as advancing the scene state to the next frame. Our results demonstrate more precise camera control than prior work, as well as state-of-the-art results in sparse-view novel view synthesis, even in challenging settings such as driving scenes and monocular dynamic video. Results are best viewed in videos. Check out our webpage! https://research.nvidia.com/labs/toronto-ai/GEN3C/
Related papers
- CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control [39.20528937415251]
We propose a method for generating fly-through videos of a scene, from a single image and a given camera trajectory.<n>We condition its UNet denoiser on the camera trajectory, using four techniques.<n>We calibrate camera positions in our datasets for scale consistency across scenes, and we train our scene exploration model, CamCtrl3D, demonstrating state-of-theart results.
arXiv Detail & Related papers (2025-01-10T14:37:32Z) - JOG3R: Towards 3D-Consistent Video Generators [12.967198315639106]
We investigate whether video generators similarly exhibit 3D-awareness.
Using structure-from-motion as a 3D-aware task, we test if intermediate features of a video generator can support camera pose estimation.
We propose the first unified video generator that is 3D-consistent, generates realistic video frames, and can potentially be repurposed for other 3D-aware tasks.
arXiv Detail & Related papers (2025-01-02T18:55:04Z) - LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors [107.83398512719981]
Single-image 3D reconstruction remains a fundamental challenge in computer vision.<n>Recent advances in Latent Video Diffusion Models offer promising 3D priors learned from large-scale video data.<n>We propose LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency.
arXiv Detail & Related papers (2024-12-12T18:58:42Z) - Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation.
To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block.
Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z) - Persistent Nature: A Generative Model of Unbounded 3D Worlds [74.51149070418002]
We present an extendable, planar scene layout grid that can be rendered from arbitrary camera poses via a 3D decoder and volume rendering.
Based on this representation, we learn a generative world model solely from single-view internet photos.
Our approach enables scene extrapolation beyond the fixed bounds of current 3D generative models, while also supporting a persistent, camera-independent world representation.
arXiv Detail & Related papers (2023-03-23T17:59:40Z) - CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields [67.76151996543588]
We learn a 3D- and camera-aware generative model which faithfully recovers not only the image but also the camera data distribution.
At test time, our model generates images with explicit control over the camera as well as the shape and appearance of the scene.
arXiv Detail & Related papers (2021-03-31T17:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.