From Rays to Projections: Better Inputs for Feed-Forward View Synthesis
- URL: http://arxiv.org/abs/2601.05116v1
- Date: Thu, 08 Jan 2026 17:03:44 GMT
- Title: From Rays to Projections: Better Inputs for Feed-Forward View Synthesis
- Authors: Zirui Wu, Zeren Jiang, Martin R. Oswald, Jie Song,
- Abstract summary: Feed-forward view synthesis models predict a novel view in a single pass with minimal 3D inductive bias.<n>Existing works encode cameras as Plcker ray maps, which tie predictions to the arbitrary world coordinate gauge and make them sensitive to small camera transformations.<n>We propose projective conditioning, which replaces raw camera parameters with a target-view projective cue that provides a stable 2D input.
- Score: 26.130973137744018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feed-forward view synthesis models predict a novel view in a single pass with minimal 3D inductive bias. Existing works encode cameras as Plücker ray maps, which tie predictions to the arbitrary world coordinate gauge and make them sensitive to small camera transformations, thereby undermining geometric consistency. In this paper, we ask what inputs best condition a model for robust and consistent view synthesis. We propose projective conditioning, which replaces raw camera parameters with a target-view projective cue that provides a stable 2D input. This reframes the task from a brittle geometric regression problem in ray space to a well-conditioned target-view image-to-image translation problem. Additionally, we introduce a masked autoencoding pretraining strategy tailored to this cue, enabling the use of large-scale uncalibrated data for pretraining. Our method shows improved fidelity and stronger cross-view consistency compared to ray-conditioned baselines on our view-consistency benchmark. It also achieves state-of-the-art quality on standard novel view synthesis benchmarks.
Related papers
- Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis [97.37770785712475]
We present a generation-based debiasing framework for object detection.<n>Our method significantly narrows the performance gap for underrepresented object groups.
arXiv Detail & Related papers (2025-10-21T02:19:12Z) - ARSS: Taming Decoder-only Autoregressive Visual Generation for View Synthesis From Single View [11.346049532150127]
textbfARSS is a framework that generates novel views from a single image conditioned on a camera trajectory.<n>Our method performs comparably to, or better than, state-of-the-art view synthesis approaches based on diffusion models.
arXiv Detail & Related papers (2025-09-27T00:03:09Z) - AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [68.94737256959661]
AnySplat is a feed forward network for novel view synthesis from uncalibrated image collections.<n>A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance.<n>In extensive zero shot evaluations, AnySplat matches the quality of pose aware baselines in both sparse and dense view scenarios.
arXiv Detail & Related papers (2025-05-29T17:49:56Z) - Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos [36.49978976710115]
We propose a novel two-stage strategy to train a view synthesis model from only raw video frames or multi-view images.<n>In the first stage, we learn to reconstruct the scene implicitly in a latent space without relying on any explicit 3D representation.<n>The learned latent camera and implicit scene representation have a large gap compared with the real 3D world.
arXiv Detail & Related papers (2025-05-19T17:59:05Z) - Stable Virtual Camera: Generative View Synthesis with Diffusion Models [51.71244310522393]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.<n>Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.<n>Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Geometry-biased Transformers for Novel View Synthesis [36.11342728319563]
We tackle the task of synthesizing novel views of an object given a few input images and associated camera viewpoints.
Our work is inspired by recent 'geometry-free' approaches where multi-view images are encoded as a (global) set-latent representation.
We propose 'Geometry-biased Transformers' (GBTs) that incorporate geometric inductive biases in the set-latent representation-based inference.
arXiv Detail & Related papers (2023-01-11T18:59:56Z) - Generalizable Patch-Based Neural Rendering [46.41746536545268]
We propose a new paradigm for learning models that can synthesize novel views of unseen scenes.
Our method is capable of predicting the color of a target ray in a novel scene directly, just from a collection of patches sampled from the scene.
We show that our approach outperforms the state-of-the-art on novel view synthesis of unseen scenes even when being trained with considerably less data than prior work.
arXiv Detail & Related papers (2022-07-21T17:57:04Z) - Free View Synthesis [100.86844680362196]
We present a method for novel view synthesis from input images that are freely distributed around a scene.
Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts.
arXiv Detail & Related papers (2020-08-12T18:16:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.