Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis
- URL: http://arxiv.org/abs/2503.15412v1
- Date: Wed, 19 Mar 2025 16:56:03 GMT
- Title: Learn Your Scales: Towards Scale-Consistent Generative Novel View Synthesis
- Authors: Fereshteh Forghani, Jason J. Yu, Tristan Aumentado-Armstrong, Konstantinos G. Derpanis, Marcus A. Brubaker,
- Abstract summary: We seek to understand and address the effect of scale ambiguity when used to train generative novel view synthesis methods.<n>In GNVS, new views of a scene or object can be minimally synthesized given a single image and are, thus, unconstrained.<n>We study the effect of scene scale ambiguity in GNVS when sampled from a single image by isolating its effect on the resulting models.
- Score: 23.967904337714234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional depth-free multi-view datasets are captured using a moving monocular camera without metric calibration. The scales of camera positions in this monocular setting are ambiguous. Previous methods have acknowledged scale ambiguity in multi-view data via various ad-hoc normalization pre-processing steps, but have not directly analyzed the effect of incorrect scene scales on their application. In this paper, we seek to understand and address the effect of scale ambiguity when used to train generative novel view synthesis methods (GNVS). In GNVS, new views of a scene or object can be minimally synthesized given a single image and are, thus, unconstrained, necessitating the use of generative methods. The generative nature of these models captures all aspects of uncertainty, including any uncertainty of scene scales, which act as nuisance variables for the task. We study the effect of scene scale ambiguity in GNVS when sampled from a single image by isolating its effect on the resulting models and, based on these intuitions, define new metrics that measure the scale inconsistency of generated views. We then propose a framework to estimate scene scales jointly with the GNVS model in an end-to-end fashion. Empirically, we show that our method reduces the scale inconsistency of generated views without the complexity or downsides of previous scale normalization methods. Further, we show that removing this ambiguity improves generated image quality of the resulting GNVS model.
Related papers
- Stable Virtual Camera: Generative View Synthesis with Diffusion Models [67.33643698669209]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.<n>Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.<n>Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - RANRAC: Robust Neural Scene Representations via Random Ray Consensus [12.161889666145127]
RANdom RAy Consensus (RANRAC) is an efficient approach to eliminate the effect of inconsistent data.
We formulate a fuzzy adaption of the RANSAC paradigm, enabling its application to large scale models.
Results indicate significant improvements compared to state-of-the-art robust methods for novel-view synthesis.
arXiv Detail & Related papers (2023-12-15T13:33:09Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - Self-improving Multiplane-to-layer Images for Novel View Synthesis [3.9901365062418312]
We present a new method for lightweight novel-view synthesis that generalizes to an arbitrary forward-facing scene.
We start by representing the scene with a set of fronto-parallel semitransparent planes and afterward convert them to deformable layers in an end-to-end manner.
Our method does not require fine-tuning when a new scene is processed and can handle an arbitrary number of views without restrictions.
arXiv Detail & Related papers (2022-10-04T13:27:14Z) - im2nerf: Image to Neural Radiance Field in the Wild [47.18702901448768]
im2nerf is a learning framework that predicts a continuous neural object representation given a single input image in the wild.
We show that im2nerf achieves the state-of-the-art performance for novel view synthesis from a single-view unposed image in the wild.
arXiv Detail & Related papers (2022-09-08T23:28:56Z) - Arbitrary-Scale Image Synthesis [149.0290830305808]
Positional encodings have enabled recent works to train a single adversarial network that can generate images of different scales.
We propose the design of scale-consistent positional encodings invariant to our generator's transformations layers.
We show competitive results for a continuum of scales on various commonly used datasets for image synthesis.
arXiv Detail & Related papers (2022-04-05T15:10:43Z) - Wide-Depth-Range 6D Object Pose Estimation in Space [124.94794113264194]
6D pose estimation in space poses unique challenges that are not commonly encountered in the terrestrial setting.
One of the most striking differences is the lack of atmospheric scattering, allowing objects to be visible from a great distance.
We propose a single-stage hierarchical end-to-end trainable network that is more robust to scale variations.
arXiv Detail & Related papers (2021-04-01T08:39:26Z) - Stable View Synthesis [100.86844680362196]
We present Stable View Synthesis (SVS)
Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene.
SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets.
arXiv Detail & Related papers (2020-11-14T07:24:43Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.