Related papers: MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control

MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control

URL: http://arxiv.org/abs/2409.06189v2
Date: Wed, 11 Sep 2024 11:50:27 GMT
Title: MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control
Authors: Yining Yao, Xi Guo, Chenjing Ding, Wei Wu,
Abstract summary: MyGo is an end-to-end framework for driving video generation. MyGo introduces motion of onboard cameras as conditions to make progress in camera controllability and multi-view consistency. Results show that MyGo has achieved state-of-the-art results in both general camera-controlled video generation and multi-view driving video generation tasks.
Score: 4.556249147612401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-quality driving video generation is crucial for providing training data for autonomous driving models. However, current generative models rarely focus on enhancing camera motion control under multi-view tasks, which is essential for driving video generation. Therefore, we propose MyGo, an end-to-end framework for video generation, introducing motion of onboard cameras as conditions to make progress in camera controllability and multi-view consistency. MyGo employs additional plug-in modules to inject camera parameters into the pre-trained video diffusion model, which retains the extensive knowledge of the pre-trained model as much as possible. Furthermore, we use epipolar constraints and neighbor view information during the generation process of each view to enhance spatial-temporal consistency. Experimental results show that MyGo has achieved state-of-the-art results in both general camera-controlled video generation and multi-view driving video generation tasks, which lays the foundation for more accurate environment simulation in autonomous driving. Project page: https://metadrivescape.github.io/papers_project/MyGo/page.html

Related papers

Plenoptic Video Generation [80.3116444692858]
We introduce PlenopticDreamer, a framework that synchronizes generative hallucinations to maintain synchronization-temporal memory.<n>The core idea is to train a multi-in-out video-conditioned model in an autoregressive manner.<n>Our training incorporates context-scaling to improve convergence, self-conditioning to hallucinations caused by error accumulation, and a long-video conditioning mechanism to support extended video generation.
arXiv Detail & Related papers (2026-01-08T18:58:32Z)
PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention [13.912161562631722]
PostCam is a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes.<n> Experiments on both real-world and synthetic datasets demonstrate that PostCam outperforms state-of-the-art methods by over 20% in camera control precision and view consistency.
arXiv Detail & Related papers (2025-11-21T12:05:46Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling [88.33638585518226]
We introduce an open-source auto-regressive video model (VaM) and its companion video-action model (VaVAM) to investigate how video pre-training transfers to real-world driving. We evaluate our models in open- and closed-loop driving scenarios, revealing that video-based pre-training holds promise for autonomous driving.
arXiv Detail & Related papers (2025-02-21T18:56:02Z)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation. Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency. Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z)
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes [15.506076058742744]
We propose DreamForge, an advanced diffusion-based autoregressive video generation model tailored for 3D-controllable long-term generation. To enhance the lane and foreground generation, we introduce perspective guidance and integrate object-wise position encoding. We also propose motion-aware temporal attention to capture motion cues and appearance changes in videos.
arXiv Detail & Related papers (2024-09-06T03:09:58Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z)
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals. We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z)
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.