Related papers: AKiRa: Augmentation Kit on Rays for optical video generation

AKiRa: Augmentation Kit on Rays for optical video generation

URL: http://arxiv.org/abs/2412.14158v2
Date: Sun, 29 Dec 2024 17:22:30 GMT
Title: AKiRa: Augmentation Kit on Rays for optical video generation
Authors: Xi Wang, Robin Courant, Marc Christie, Vicky Kalogeiton,
Abstract summary: AKiRa is a novel augmentation framework that builds and trains a camera adapter with a complex camera model over an existing video generation backbone.<n>It enables fine-tuned control over camera motion as well as complex optical parameters to achieve cinematic effects such as zoom, fisheye effect, and bokeh.<n>This work sets a new landmark in controlled and optically enhanced video generation, paving the way for future optical video generation methods.
Score: 9.255424148510572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in text-conditioned video diffusion have greatly improved video quality. However, these methods offer limited or sometimes no control to users on camera aspects, including dynamic camera motion, zoom, distorted lens and focus shifts. These motion and optical aspects are crucial for adding controllability and cinematic elements to generation frameworks, ultimately resulting in visual content that draws focus, enhances mood, and guides emotions according to filmmakers' controls. In this paper, we aim to close the gap between controllable video generation and camera optics. To achieve this, we propose AKiRa (Augmentation Kit on Rays), a novel augmentation framework that builds and trains a camera adapter with a complex camera model over an existing video generation backbone. It enables fine-tuned control over camera motion as well as complex optical parameters (focal length, distortion, aperture) to achieve cinematic effects such as zoom, fisheye effect, and bokeh. Extensive experiments demonstrate AKiRa's effectiveness in combining and composing camera optics while outperforming all state-of-the-art methods. This work sets a new landmark in controlled and optically enhanced video generation, paving the way for future optical video generation methods.

Related papers

Light-X: Generative 4D Video Rendering with Camera and Illumination Control [52.87059646145144]
Light-X is a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control.<n>To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping.
arXiv Detail & Related papers (2025-12-04T18:59:57Z)
Generative Photographic Control for Scene-Consistent Video Cinematic Editing [75.45726688666083]
We propose CineCtrl, the first video cinematic editing framework that provides fine control over professional camera parameters.<n>We introduce a decoupled cross-attention mechanism to disentangle camera motion from photographic inputs.<n>Our model generates high-fidelity videos with precisely controlled, user-specified photographic camera effects.
arXiv Detail & Related papers (2025-11-17T03:17:23Z)
VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos [58.09854638265381]
VividCam is a training paradigm that enables diffusion models to learn complex camera motions from synthetic videos.<n>We demonstrate that our design synthesizes a wide range of precisely controlled and complex camera motions using surprisingly simple synthetic data.
arXiv Detail & Related papers (2025-10-28T19:12:22Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z)
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis [47.281377781348596]
FloVD is a video diffusion model for camera-controllable video generation. optical flow can be directly estimated from videos. Method enables detailed camera control by leveraging background motion.
arXiv Detail & Related papers (2025-02-12T09:38:41Z)
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers [66.29824750770389]
We analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation.<n>We compound these findings to design the Advanced 3D Camera Control (AC3D) architecture.
arXiv Detail & Related papers (2024-11-27T18:49:13Z)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation. Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency. Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)
MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos. Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z)
Video Reconstruction from a Single Motion Blurred Image using Learned Dynamic Phase Coding [34.76550131783525]
We propose a hybrid optical-digital method for video reconstruction using a single motion-blurred image. We use a learned dynamic phase-coding in the lens aperture during the image acquisition to encode the motion trajectories. The proposed computational camera generates a sharp frame burst of the scene at various frame rates from a single coded motion-blurred image.
arXiv Detail & Related papers (2021-12-28T02:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.