ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
- URL: http://arxiv.org/abs/2512.10286v1
- Date: Thu, 11 Dec 2025 05:05:07 GMT
- Title: ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
- Authors: Xiaoxue Wu, Xinyuan Chen, Yaohui Wang, Yu Qiao,
- Abstract summary: ShotDirector is an efficient framework that integrates parameter-level camera control and hierarchical editing-pattern-aware prompting.<n>Our framework effectively combines parameter-level conditions with high-level semantic guidance, achieving film-like controllable shot transitions.
- Score: 46.3918771233715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shot transitions play a pivotal role in multi-shot video generation, as they determine the overall narrative expression and the directorial design of visual storytelling. However, recent progress has primarily focused on low-level visual consistency across shots, neglecting how transitions are designed and how cinematographic language contributes to coherent narrative expression. This often leads to mere sequential shot changes without intentional film-editing patterns. To address this limitation, we propose ShotDirector, an efficient framework that integrates parameter-level camera control and hierarchical editing-pattern-aware prompting. Specifically, we adopt a camera control module that incorporates 6-DoF poses and intrinsic settings to enable precise camera information injection. In addition, a shot-aware mask mechanism is employed to introduce hierarchical prompts aware of professional editing patterns, allowing fine-grained control over shot content. Through this design, our framework effectively combines parameter-level conditions with high-level semantic guidance, achieving film-like controllable shot transitions. To facilitate training and evaluation, we construct ShotWeaver40K, a dataset that captures the priors of film-like editing patterns, and develop a set of evaluation metrics for controllable multi-shot video generation. Extensive experiments demonstrate the effectiveness of our framework.
Related papers
- MultiShotMaster: A Controllable Multi-Shot Video Generation Framework [67.38203939500157]
Current generation techniques excel at single-shot clips but struggle to produce narrative multi-shot videos.<n>We propose MultiShotMaster, a framework for highly controllable multi-shot video generation.
arXiv Detail & Related papers (2025-12-02T18:59:48Z) - In-Context Sync-LoRA for Portrait Video Editing [66.21215915461069]
Sync-LoRA is a method for editing portrait videos that achieves high-quality visual modifications.<n>We train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance.<n>This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame.
arXiv Detail & Related papers (2025-12-02T18:40:35Z) - Generative Photographic Control for Scene-Consistent Video Cinematic Editing [75.45726688666083]
We propose CineCtrl, the first video cinematic editing framework that provides fine control over professional camera parameters.<n>We introduce a decoupled cross-attention mechanism to disentangle camera motion from photographic inputs.<n>Our model generates high-fidelity videos with precisely controlled, user-specified photographic camera effects.
arXiv Detail & Related papers (2025-11-17T03:17:23Z) - ESA: Energy-Based Shot Assembly Optimization for Automatic Video Editing [12.967240894970098]
Shot assembly is a crucial step in film production and video editing.<n>Traditionally, this process has been manually executed by experienced editors.<n>We propose an energy-based optimization method for video shot assembly.
arXiv Detail & Related papers (2025-11-04T11:48:22Z) - CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models [28.224969852134606]
We introduce CineTrans, a framework for generating coherent multi-shot videos with cinematic, film-style transitions.<n>CineTrans produces cinematic multi-shot sequences while adhering to the film editing style, avoiding unstable transitions or naive concatenations.
arXiv Detail & Related papers (2025-08-15T13:58:22Z) - GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography [98.28272367169465]
We introduce an auto-regressive model inspired by the expertise of Directors of Photography to generate artistic and expressive camera trajectories.<n>Thanks to the comprehensive and diverse database, we train an auto-regressive, decoder-only Transformer for high-quality, context-aware camera movement generation.<n>Experiments demonstrate that compared to existing methods, GenDoP offers better controllability, finer-grained trajectory adjustments, and higher motion stability.
arXiv Detail & Related papers (2025-04-09T17:56:01Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography [23.070207691087827]
Real-world cinematographic rules show that directors can create immersion by comprehensively synchronizing the camera with the actor.
Inspired by this strategy, we propose a deep camera control framework that enables actor-camera synchronization in three aspects.
Our proposed method yields immersive cinematic videos of high quality, both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-29T22:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.