OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
- URL: http://arxiv.org/abs/2512.10940v1
- Date: Thu, 11 Dec 2025 18:59:05 GMT
- Title: OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
- Authors: Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Petr Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna,
- Abstract summary: We introduce OmniView, a unified framework that generalizes across a wide range of 4D consistency tasks.<n>Our method separately represents space, time, and view conditions, enabling flexible combinations of these inputs.<n>For example, OmniView can synthesize novel views from static, dynamic, and multiview inputs, extrapolate trajectories forward and backward in time, and create videos from text or image prompts with full camera control.
- Score: 80.3346344429389
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video with camera control, image-to-video, amongst others. Therefore, these fragmented approaches are trained on disjoint slices of available 3D/4D data. We introduce OmniView, a unified framework that generalizes across a wide range of 4D consistency tasks. Our method separately represents space, time, and view conditions, enabling flexible combinations of these inputs. For example, OmniView can synthesize novel views from static, dynamic, and multiview inputs, extrapolate trajectories forward and backward in time, and create videos from text or image prompts with full camera control. OmniView is competitive with task-specific models across diverse benchmarks and metrics, improving image quality scores among camera-conditioned diffusion models by up to 33\% in multiview NVS LLFF dataset, 60\% in dynamic NVS Neural 3D Video benchmark, 20\% in static camera control on RE-10K, and reducing camera trajectory errors by 4x in text-conditioned video generation. With strong generalizability in one model, OmniView demonstrates the feasibility of a generalist 4D video model. Project page is available at https://snap-research.github.io/OmniView/
Related papers
- AnyView: Synthesizing Any Novel View in Dynamic Scenes [23.16723540943151]
We introduce bfAnyView, a diffusion-based video generation framework for emphdynamic viewtext synthesis with minimal biases or geometric assumptions.<n>We show competitive results with the current state of the art, and propose a new benchmark tailored towards emphextreme dynamic viewtext in diverse real-world scenarios.
arXiv Detail & Related papers (2026-01-23T18:59:58Z) - CausNVS: Autoregressive Multi-view Diffusion for Flexible 3D Novel View Synthesis [48.43677384182078]
CausNVS is a multi-view diffusion model in an autoregressive setting.<n>It supports arbitrary input-output view configurations and generates views sequentially.<n>It achieves consistently strong visual quality across diverse settings.
arXiv Detail & Related papers (2025-09-08T11:49:51Z) - Stable Virtual Camera: Generative View Synthesis with Diffusion Models [51.71244310522393]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.<n>Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.<n>Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z) - CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models [98.03734318657848]
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video.<n>We leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis.<n>We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks.
arXiv Detail & Related papers (2024-11-27T18:57:16Z) - Controlling Space and Time with Diffusion Models [34.7002868116714]
We present 4DiM, a cascaded diffusion model for 4D novel view synthesis (NVS)<n>We enable training on a mixture of 3D (with camera pose), 4D (pose+time) and video (time but no pose) data.<n>4DiM is the first-ever NVS method with intuitive metric-scale camera pose control.
arXiv Detail & Related papers (2024-07-10T17:23:33Z) - Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras [65.54875149514274]
We present the first approach to render highly realistic free-viewpoint videos of a human actor in general apparel.
At inference, our method only requires four camera views of the moving actor and the respective 3D skeletal pose.
It handles actors in wide clothing, and reproduces even fine-scale dynamic detail.
arXiv Detail & Related papers (2023-12-12T16:45:52Z) - FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction [70.09086274139504]
Multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras.
We introduce FLEX, an end-to-end parameter-free multi-view model.
We demonstrate results on the Human3.6M and KTH Multi-view Football II datasets.
arXiv Detail & Related papers (2021-05-05T09:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.