ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding
- URL: http://arxiv.org/abs/2511.19827v1
- Date: Tue, 25 Nov 2025 01:38:56 GMT
- Title: ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding
- Authors: Byeongjun Park, Byung-Hoon Kim, Hyungjin Chung, Jong Chul Ye,
- Abstract summary: ReDirector is a camera-controlled video generation method for dynamically captured variable-length.<n>We introduce Rotary Camera.<n>RoCE, a camera that integrates RoPE and retake-conditioned cameras.<n>Our method generalizes to out-of-distribution camera trajectories and video lengths, yielding improved dynamic object localization and static background preservation.
- Score: 60.574308105414026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions of the input video and the target retake. Moreover, we introduce Rotary Camera Encoding (RoCE), a camera-conditioned RoPE phase shift that captures and integrates multi-view relationships within and across the input and target videos. By integrating camera conditions into RoPE, our method generalizes to out-of-distribution camera trajectories and video lengths, yielding improved dynamic object localization and static background preservation. Extensive experiments further demonstrate significant improvements in camera controllability, geometric consistency, and video quality across various trajectories and lengths.
Related papers
- UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models [54.564740558030245]
We present UCM, a novel framework that unifies long-term memory and precise camera control via a time-aware positional encoding warping mechanism.<n>We also introduce a scalable data curation strategy utilizing point-cloud-based rendering to simulate scene revisiting.
arXiv Detail & Related papers (2026-02-26T12:54:46Z) - ReRoPE: Repurposing RoPE for Relative Camera Control [36.225344172088235]
We introduce ReRoPE, a plug-and-play framework that incorporates relative camera information into pre-trained video diffusion models.<n>We evaluate our method on both image-to-video (I2V) and video-to-video (V2V) tasks in terms of camera control accuracy and visual fidelity.
arXiv Detail & Related papers (2026-02-08T17:49:10Z) - Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z) - Unified Camera Positional Encoding for Controlled Video Generation [48.5789182990001]
Transformers have emerged as a universal backbone across 3D perception, video generation, and world models for autonomous driving and embodied AI.<n>We introduce Relative Ray, a geometry-consistent representation that unifies complete camera information, including 6-DoF poses, intrinsics, and lens distortions.<n>To facilitate systematic training and evaluation, we construct a large video dataset covering a wide range of camera motions and lens types.
arXiv Detail & Related papers (2025-12-08T07:34:01Z) - GenCompositor: Generative Video Compositing with Diffusion Transformer [68.00271033575736]
Traditional pipelines require intensive labor efforts and expert collaboration, resulting in lengthy production cycles and high manpower costs.<n>This new task strives to adaptively inject identity and motion information of foreground video to the target video in an interactive manner.<n>Experiments demonstrate that our method effectively realizes generative video compositing, outperforming existing possible solutions in fidelity and consistency.
arXiv Detail & Related papers (2025-09-02T16:10:13Z) - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.<n>It reproduces the dynamic scene of an input video at novel camera trajectories.<n>Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.<n>We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - VRoPE: Rotary Position Embedding for Video Large Language Models [20.76019756946152]
Position Embedding (RoPE) has shown strong performance in text-based Large Language Models (LLMs)<n>But extending it to video remains a challenge due to the intricate structure of video frames.<n>We propose Position Rotary Embedding (VRoPE), a novel positional encoding method tailored for Video-LLMs.
arXiv Detail & Related papers (2025-02-17T10:53:57Z) - CPA: Camera-pose-awareness Diffusion Transformer for Video Generation [15.512186399114999]
CPA is a text-to-video generation approach that integrates the textual, visual, and spatial conditions.<n>Our method outperforms LDM-based methods for long video generation while achieving optimal performance in trajectory consistency and object consistency.
arXiv Detail & Related papers (2024-12-02T12:10:00Z) - Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.