NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
- URL: http://arxiv.org/abs/2405.15364v1
- Date: Fri, 24 May 2024 08:56:19 GMT
- Title: NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
- Authors: Meng You, Zhiyu Zhu, Hui Liu, Junhui Hou,
- Abstract summary: We propose a new novel view synthesis (NVS) paradigm that operates textitwithout the need for training.
NVS-r adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences.
- Score: 48.57740681957145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates \textit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes or monocular videos of dynamic scenes. Specifically, built upon our theoretical modeling, we iteratively modulate the score function with the given scene priors represented with warped input views to control the video diffusion process. Moreover, by theoretically exploring the boundary of the estimation error, we achieve the modulation in an adaptive fashion according to the view pose and the number of diffusion steps. Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our NVS-Solver over state-of-the-art methods both quantitatively and qualitatively. \textit{ Source code in } \href{https://github.com/ZHU-Zhiyu/NVS_Solver}{https://github.com/ZHU-Zhiyu/NVS$\_$Solver}.
Related papers
- NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images [50.36605863731669]
NVComposer is a novel approach that eliminates the need for explicit external alignment.
NVComposer achieves state-of-the-art performance in generative multi-view NVS tasks.
Our approach shows substantial improvements in synthesis quality as the number of unposed input views increases.
arXiv Detail & Related papers (2024-12-04T17:58:03Z) - SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input [6.275971782566314]
We introduce a novel self-supervised stereo synthesis video paradigm via a video diffusion model, termed SpatialDreamer.
To address the stereo video data insufficiency, we propose a Depth based Video Generation module DVG.
We also propose RefinerNet along with a self-supervised synthetic framework designed to facilitate efficient and dedicated training.
arXiv Detail & Related papers (2024-11-18T15:12:59Z) - Novel View Synthesis with Pixel-Space Diffusion Models [4.844800099745365]
generative models are being increasingly employed in novel view synthesis (NVS)
We adapt a modern diffusion model architecture for end-to-end NVS in the pixel space.
We introduce a novel NVS training scheme that utilizes single-view datasets, capitalizing on their relative abundance.
arXiv Detail & Related papers (2024-11-12T12:58:33Z) - SF-V: Single Forward Video Generation Model [57.292575082410785]
We propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained models.
Experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead.
arXiv Detail & Related papers (2024-06-06T17:58:27Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.