Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
- URL: http://arxiv.org/abs/2503.09151v2
- Date: Mon, 17 Mar 2025 13:01:59 GMT
- Title: Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
- Authors: Hyeonho Jeong, Suhyeon Lee, Jong Chul Ye,
- Abstract summary: We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video.<n>Our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors.
- Score: 51.328567400947435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video. Unlike mainstream approaches that train multi-view video diffusion models on large-scale 4D datasets, our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors. In essence, Reangle-A-Video operates in two stages. (1) Multi-View Motion Learning: An image-to-video diffusion transformer is synchronously fine-tuned in a self-supervised manner to distill view-invariant motion from a set of warped videos. (2) Multi-View Consistent Image-to-Images Translation: The first frame of the input video is warped and inpainted into various camera perspectives under an inference-time cross-view consistency guidance using DUSt3R, generating multi-view consistent starting images. Extensive experiments on static view transport and dynamic camera control show that Reangle-A-Video surpasses existing methods, establishing a new solution for multi-view video generation. We will publicly release our code and data. Project page: https://hyeonho99.github.io/reangle-a-video/
Related papers
- Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model [52.0192865857058]
We propose the first training-free 4D video generation method that leverages the off-the-shelf video diffusion models to generate multi-view videos from a single input video.
Our method is training-free and fully utilizes an off-the-shelf video diffusion model, offering a practical and effective solution for multi-view video generation.
arXiv Detail & Related papers (2025-03-28T17:14:48Z) - TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models [33.219657261649324]
TrajectoryCrafter is a novel approach to redirect camera trajectories for monocular videos.<n>By disentangling deterministic view transformations from content generation, our method achieves precise control over user-specified camera trajectories.
arXiv Detail & Related papers (2025-03-07T17:57:53Z) - SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation.<n>We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints.<n>Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z) - Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution.
We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z) - Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control [70.17137528953953]
Collaborative video diffusion (CVD) is trained on top of a state-of-the-art camera-control module for video generation.
CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines.
arXiv Detail & Related papers (2024-05-27T17:58:01Z) - CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers [18.67069364925506]
We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement.
Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.
arXiv Detail & Related papers (2024-05-21T20:54:27Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - DrivingDiffusion: Layout-Guided multi-view driving scene video
generation with latent diffusion model [19.288610627281102]
We propose DrivingDiffusion to generate realistic multi-view videos controlled by 3D layout.
Our model can generate large-scale realistic multi-camera driving videos in complex urban scenes.
arXiv Detail & Related papers (2023-10-11T18:00:08Z) - Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning [50.60891619269651]
Control-A-Video is a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps.
We propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process.
Our framework generates higher-quality, more consistent videos compared to existing state-of-the-art methods in controllable text-to-video generation.
arXiv Detail & Related papers (2023-05-23T09:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.