Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training
- URL: http://arxiv.org/abs/2412.06029v1
- Date: Sun, 08 Dec 2024 18:59:54 GMT
- Title: Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training
- Authors: Zhenghong Zhou, Jie An, Jiebo Luo,
- Abstract summary: We introduce Latent-Reframe, which enables camera control in a pre-trained video diffusion model without fine-tuning.
Latent-Reframe operates during the sampling stage, maintaining efficiency while preserving the original model distribution.
Our approach reframes the latent code of video frames to align with the input camera trajectory through time-aware point clouds.
- Score: 51.851390459940646
- License:
- Abstract: Precise camera pose control is crucial for video generation with diffusion models. Existing methods require fine-tuning with additional datasets containing paired videos and camera pose annotations, which are both data-intensive and computationally costly, and can disrupt the pre-trained model distribution. We introduce Latent-Reframe, which enables camera control in a pre-trained video diffusion model without fine-tuning. Unlike existing methods, Latent-Reframe operates during the sampling stage, maintaining efficiency while preserving the original model distribution. Our approach reframes the latent code of video frames to align with the input camera trajectory through time-aware point clouds. Latent code inpainting and harmonization then refine the model latent space, ensuring high-quality video generation. Experimental results demonstrate that Latent-Reframe achieves comparable or superior camera control precision and video quality to training-based methods, without the need for fine-tuning on additional datasets.
Related papers
- Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation [20.689304579898728]
Event-based Video Frame Interpolation (EVFI) uses sparse, high-temporal-resolution event measurements as motion guidance.
We adapt pre-trained video diffusion models trained on internet-scale datasets to EVFI.
Our method outperforms existing methods and generalizes across cameras far better than existing approaches.
arXiv Detail & Related papers (2024-12-10T18:55:30Z) - SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation [22.693060144042196]
Methods for image-to-video generation have achieved impressive, photo-realistic quality.
adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error.
We introduce a framework for controllable image-to-video generation that is self-guided.
Our zero-shot method outperforms unsupervised baselines while narrowing down the performance gap with supervised models.
arXiv Detail & Related papers (2024-11-07T18:56:11Z) - Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation [60.27691946892796]
We present a method for generating video sequences with coherent motion between a pair of input key frames.
Our experiments show that our method outperforms both existing diffusion-based methods and traditional frame techniques.
arXiv Detail & Related papers (2024-08-27T17:57:14Z) - VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates.
Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - Camera clustering for scalable stream-based active distillation [12.730493079013456]
We present a scalable framework designed to craft efficient lightweight models for video object detection.
We scrutinize methodologies for the ideal selection of training images from video streams and the efficacy of model sharing across numerous cameras.
arXiv Detail & Related papers (2024-04-16T09:28:54Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.