GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
- URL: http://arxiv.org/abs/2504.07083v2
- Date: Thu, 10 Apr 2025 16:10:15 GMT
- Title: GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
- Authors: Mengchen Zhang, Tong Wu, Jing Tan, Ziwei Liu, Gordon Wetzstein, Dahua Lin,
- Abstract summary: We introduce an auto-regressive model inspired by the expertise of Directors of Photography to generate artistic and expressive camera trajectories.<n>Thanks to the comprehensive and diverse database, we train an auto-regressive, decoder-only Transformer for high-quality, context-aware camera movement generation.<n>Experiments demonstrate that compared to existing methods, GenDoP offers better controllability, finer-grained trajectory adjustments, and higher motion stability.
- Score: 98.28272367169465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera trajectory design plays a crucial role in video production, serving as a fundamental tool for conveying directorial intent and enhancing visual storytelling. In cinematography, Directors of Photography meticulously craft camera movements to achieve expressive and intentional framing. However, existing methods for camera trajectory generation remain limited: Traditional approaches rely on geometric optimization or handcrafted procedural systems, while recent learning-based methods often inherit structural biases or lack textual alignment, constraining creative synthesis. In this work, we introduce an auto-regressive model inspired by the expertise of Directors of Photography to generate artistic and expressive camera trajectories. We first introduce DataDoP, a large-scale multi-modal dataset containing 29K real-world shots with free-moving camera trajectories, depth maps, and detailed captions in specific movements, interaction with the scene, and directorial intent. Thanks to the comprehensive and diverse database, we further train an auto-regressive, decoder-only Transformer for high-quality, context-aware camera movement generation based on text guidance and RGBD inputs, named GenDoP. Extensive experiments demonstrate that compared to existing methods, GenDoP offers better controllability, finer-grained trajectory adjustments, and higher motion stability. We believe our approach establishes a new standard for learning-based cinematography, paving the way for future advancements in camera control and filmmaking. Our project website: https://kszpxxzmc.github.io/GenDoP/.
Related papers
- Towards Understanding Camera Motions in Any Video [80.223048294482]
We introduce CameraBench, a large-scale dataset and benchmark designed to assess and improve camera motion understanding.
CameraBench consists of 3,000 diverse internet videos annotated by experts through a rigorous quality control process.
One of our contributions is a taxonomy of camera motion primitives, designed in collaboration with cinematographers.
arXiv Detail & Related papers (2025-04-21T18:34:57Z) - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.<n>It reproduces the dynamic scene of an input video at novel camera trajectories.<n>Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - ChatCam: Empowering Camera Control through Conversational AI [67.31920821192323]
ChatCam is a system that navigates camera movements through conversations with users.
To achieve this, we propose CineGPT, a GPT-based autoregressive model for text-conditioned camera trajectory generation.
We also develop an Anchor Determinator to ensure precise camera trajectory placement.
arXiv Detail & Related papers (2024-09-25T20:13:41Z) - CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion [29.320516135326546]
CinePreGen is a visual previsualization system enhanced with engine-powered diffusion.
It features a novel camera and storyboard interface that offers dynamic control, from global to local camera adjustments.
arXiv Detail & Related papers (2024-08-30T17:16:18Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z) - Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography [23.070207691087827]
Real-world cinematographic rules show that directors can create immersion by comprehensively synchronizing the camera with the actor.
Inspired by this strategy, we propose a deep camera control framework that enables actor-camera synchronization in three aspects.
Our proposed method yields immersive cinematic videos of high quality, both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-29T22:02:15Z) - Batteries, camera, action! Learning a semantic control space for
expressive robot cinematography [15.895161373307378]
We develop a data-driven framework that enables editing of complex camera positioning parameters in a semantic space.
First, we generate a database of video clips with a diverse range of shots in a photo-realistic simulator.
We use hundreds of participants in a crowd-sourcing framework to obtain scores for a set of semantic descriptors for each clip.
arXiv Detail & Related papers (2020-11-19T21:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.