Related papers: GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

URL: http://arxiv.org/abs/2512.09112v1
Date: Tue, 09 Dec 2025 20:54:35 GMT
Title: GimbalDiffusion: Gravity-Aware Camera Control for Video Generation
Authors: Frédéric Fortier-Chouinard, Yannick Hold-Geoffroy, Valentin Deschaintre, Matheus Gadelha, Jean-François Lalonde,
Abstract summary: We introduce a framework that enables camera control grounded in physical-world coordinates, using gravity as a global reference.<n>We leverage panoramic 360-degree videos to construct a wide variety of camera trajectories, well beyond the predominantly straight, forward-facing trajectories seen in conventional video data.<n>We establish a benchmark for camera-aware video generation by rebalancing SpatialVID-HQ for comprehensive evaluation under wide camera pitch variation.
Score: 30.697985626973665
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent progress in text-to-video generation has achieved remarkable realism, yet fine-grained control over camera motion and orientation remains elusive. Existing approaches typically encode camera trajectories through relative or ambiguous representations, limiting explicit geometric control. We introduce GimbalDiffusion, a framework that enables camera control grounded in physical-world coordinates, using gravity as a global reference. Instead of describing motion relative to previous frames, our method defines camera trajectories in an absolute coordinate system, allowing precise and interpretable control over camera parameters without requiring an initial reference frame. We leverage panoramic 360-degree videos to construct a wide variety of camera trajectories, well beyond the predominantly straight, forward-facing trajectories seen in conventional video data. To further enhance camera guidance, we introduce null-pitch conditioning, an annotation strategy that reduces the model's reliance on text content when conflicting with camera specifications (e.g., generating grass while the camera points towards the sky). Finally, we establish a benchmark for camera-aware video generation by rebalancing SpatialVID-HQ for comprehensive evaluation under wide camera pitch variation. Together, these contributions advance the controllability and robustness of text-to-video models, enabling precise, gravity-aligned camera manipulation within generative frameworks.

Related papers

Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation [21.084121261693365]
We propose DepthDirector, a video re-rendering framework with precise camera controllability.<n>By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories.
arXiv Detail & Related papers (2026-01-15T09:26:45Z)
CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization [32.42754288735215]
CETCAM is a camera-controllable video generation framework.<n>It eliminates the need for camera annotations through a consistent and tokenization scheme.<n>It learns robust camera controllability from diverse raw video data and refines fine-grained visual quality using high-fidelity datasets.
arXiv Detail & Related papers (2025-12-22T04:21:39Z)
Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z)
Unified Camera Positional Encoding for Controlled Video Generation [48.5789182990001]
Transformers have emerged as a universal backbone across 3D perception, video generation, and world models for autonomous driving and embodied AI.<n>We introduce Relative Ray, a geometry-consistent representation that unifies complete camera information, including 6-DoF poses, intrinsics, and lens distortions.<n>To facilitate systematic training and evaluation, we construct a large video dataset covering a wide range of camera motions and lens types.
arXiv Detail & Related papers (2025-12-08T07:34:01Z)
Generative Photographic Control for Scene-Consistent Video Cinematic Editing [75.45726688666083]
We propose CineCtrl, the first video cinematic editing framework that provides fine control over professional camera parameters.<n>We introduce a decoupled cross-attention mechanism to disentangle camera motion from photographic inputs.<n>Our model generates high-fidelity videos with precisely controlled, user-specified photographic camera effects.
arXiv Detail & Related papers (2025-11-17T03:17:23Z)
CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion [31.032317079295762]
CamPVG is the first diffusion-based framework for panoramic video generation guided by precise camera poses.<n>We achieve camera position encoding for panoramic images and cross-view feature aggregation based on spherical projection.<n>Our method generates high-quality panoramic videos consistent with camera trajectories, far surpassing existing methods in panoramic video generation.
arXiv Detail & Related papers (2025-09-24T10:34:24Z)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.<n>We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z)
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength [11.778832811404259]
I2VControl-Camera is a novel camera control method that significantly enhances controllability while providing over the strength of subject motion.<n>To accurately control and adjust the strength of subject motion, we explicitly model the higher-order components of the video trajectory expansion.
arXiv Detail & Related papers (2024-11-10T16:59:39Z)
CamI2V: Camera-Controlled Image-to-Video Diffusion Model [11.762824216082508]
Integrated camera pose is a user-friendly and physics-informed condition in video diffusion models, enabling precise camera control.<n>We identify one of the key challenges as effectively modeling noisy cross-frame interactions to enhance geometry consistency and camera controllability.<n>We innovatively associate the quality of a condition with its ability to reduce uncertainty and interpret noisy cross-frame features as a form of noisy condition.
arXiv Detail & Related papers (2024-10-21T12:36:27Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We show how to tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism.<n>Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow. A novel neural network architecture is proposed for processing irregular point trajectory data. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.