CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
- URL: http://arxiv.org/abs/2405.13195v1
- Date: Tue, 21 May 2024 20:54:27 GMT
- Title: CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
- Authors: Andrew Marmon, Grant Schindler, José Lezama, Dan Kondratyuk, Bryan Seybold, Irfan Essa,
- Abstract summary: We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement.
Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.
- Score: 18.67069364925506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output of such models. We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement over the course of the generated video. Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.
Related papers
- Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation.
Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency.
Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z) - VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates.
Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models.
Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z) - CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation.
To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block.
Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z) - MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos.
Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z) - PV3D: A 3D Generative Model for Portrait Video Generation [94.96025739097922]
We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos.
PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
arXiv Detail & Related papers (2022-12-13T05:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.