TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
- URL: http://arxiv.org/abs/2503.05638v1
- Date: Fri, 07 Mar 2025 17:57:53 GMT
- Title: TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
- Authors: Mark YU, Wenbo Hu, Jinbo Xing, Ying Shan,
- Abstract summary: TrajectoryCrafter is a novel approach to redirect camera trajectories for monocular videos.<n>By disentangling deterministic view transformations from content generation, our method achieves precise control over user-specified camera trajectories.
- Score: 33.219657261649324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present TrajectoryCrafter, a novel approach to redirect camera trajectories for monocular videos. By disentangling deterministic view transformations from stochastic content generation, our method achieves precise control over user-specified camera trajectories. We propose a novel dual-stream conditional video diffusion model that concurrently integrates point cloud renders and source videos as conditions, ensuring accurate view transformations and coherent 4D content generation. Instead of leveraging scarce multi-view videos, we curate a hybrid training dataset combining web-scale monocular videos with static multi-view datasets, by our innovative double-reprojection strategy, significantly fostering robust generalization across diverse scenes. Extensive evaluations on multi-view and large-scale monocular videos demonstrate the superior performance of our method.
Related papers
- Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model [52.0192865857058]
We propose the first training-free 4D video generation method that leverages the off-the-shelf video diffusion models to generate multi-view videos from a single input video.
Our method is training-free and fully utilizes an off-the-shelf video diffusion model, offering a practical and effective solution for multi-view video generation.
arXiv Detail & Related papers (2025-03-28T17:14:48Z) - CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models [89.63787060844409]
CameraCtrl II is a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.
We take an approach that progressively expands the generation of dynamic scenes.
arXiv Detail & Related papers (2025-03-13T17:42:01Z) - Reangle-A-Video: 4D Video Generation as Video-to-Video Translation [51.328567400947435]
We introduce Reangle-A-Video, a unified framework for generating synchronized multi-view videos from a single input video.
Our method reframes the multi-view video generation task as video-to-videos translation, leveraging publicly available image and video diffusion priors.
arXiv Detail & Related papers (2025-03-12T08:26:15Z) - Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models [89.79067761383855]
Vchitect-2.0 is a parallel transformer architecture designed to scale up video diffusion models for large-scale text-to-video generation.<n>By introducing a novel Multimodal Diffusion Block, our approach achieves consistent alignment between text descriptions and generated video frames.<n>To overcome memory and computational bottlenecks, we propose a Memory-efficient Training framework.
arXiv Detail & Related papers (2025-01-14T21:53:11Z) - Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation [0.0]
We present a conditional encoder designed to adapt an image-to-video model for a large-motion frame.<n>To enhance performance, we integrate a dual-branch feature extractor and propose a cross-frame attention mechanism.<n>Our approach demonstrates superior performance on the Fr'teche Video Distance metric when evaluated against other state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-22T14:49:55Z) - SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation.<n>We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints.<n>Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z) - Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training [51.851390459940646]
We introduce Latent-Reframe, which enables camera control in a pre-trained video diffusion model without fine-tuning.<n>Latent-Reframe operates during the sampling stage, maintaining efficiency while preserving the original model distribution.<n>Our approach reframes the latent code of video frames to align with the input camera trajectory through time-aware point clouds.
arXiv Detail & Related papers (2024-12-08T18:59:54Z) - SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix [60.48666051245761]
We propose a pose-free and training-free approach for generating 3D stereoscopic videos.
Our method warps a generated monocular video into camera views on stereoscopic baseline using estimated video depth.
We develop a disocclusion boundary re-injection scheme that further improves the quality of video inpainting.
arXiv Detail & Related papers (2024-06-29T08:33:55Z) - Vivid-ZOO: Multi-View Video Generation with Diffusion Model [76.96449336578286]
New challenges lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution.
We propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text.
arXiv Detail & Related papers (2024-06-12T21:44:04Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.