Related papers: Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

URL: http://arxiv.org/abs/2405.17414v1
Date: Mon, 27 May 2024 17:58:01 GMT
Title: Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein,
Abstract summary: Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation. CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
Score: 70.17137528953953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories. Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation, CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments. Project page: https://collaborativevideodiffusion.github.io/.

Related papers

Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation [21.084121261693365]
We propose DepthDirector, a video re-rendering framework with precise camera controllability.<n>By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories.
arXiv Detail & Related papers (2026-01-15T09:26:45Z)
Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation [49.12018869332346]
InfCam is a camera-controlled video-to-video generation framework with high pose fidelity.<n>The framework integrates two key components: (1) infinite homography warping, which encodes 3D camera rotations directly within the 2D latent space of a video diffusion model.
arXiv Detail & Related papers (2025-12-18T20:03:05Z)
View-Consistent Diffusion Representations for 3D-Consistent Video Generation [60.68052293389281]
Current generated videos still contain visual artifacts arising from 3D inconsistencies.<n>We propose ViCoDR, a new approach for improving the 3D consistency of video models by learning multi-view consistent diffusion representations.
arXiv Detail & Related papers (2025-11-24T11:16:55Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation [51.328567400947435]
We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video. Our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors.
arXiv Detail & Related papers (2025-03-12T08:26:15Z)
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation. We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints. Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z)
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention [62.2447324481159]
Cavia is a novel framework for camera-controllable, multi-view video generation. Our framework extends the spatial and temporal attention modules, improving both viewpoint and temporal consistency. Cavia is the first of its kind that allows the user to specify distinct camera motion while obtaining object motion.
arXiv Detail & Related papers (2024-10-14T17:46:32Z)
Multi-sentence Video Grounding for Long Video Generation [46.363084926441466]
We propose a brave and new idea of Multi-sentence Video Grounding for Long Video Generation. Our approach seamlessly extends the development in image/video editing, video morphing and personalized generation, and video grounding to the long video generation.
arXiv Detail & Related papers (2024-07-18T07:05:05Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
Training-free Camera Control for Video Generation [19.526135830699882]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation. To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block. Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z)
CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers [18.67069364925506]
We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement. Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.
arXiv Detail & Related papers (2024-05-21T20:54:27Z)
CameraCtrl: Enabling Camera Control for Text-to-Video Generation [86.36135895375425]
Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely. Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative nuances. We introduce CameraCtrl, enabling accurate camera pose control for video diffusion models.
arXiv Detail & Related papers (2024-04-02T16:52:41Z)
LaMD: Latent Motion Diffusion for Video Generation [69.4111397077229]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator. Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
arXiv Detail & Related papers (2023-04-23T10:32:32Z)
A Good Image Generator Is What You Need for High-Resolution Video Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.