CheapNVS: Real-Time On-Device Narrow-Baseline Novel View Synthesis
- URL: http://arxiv.org/abs/2501.14533v1
- Date: Fri, 24 Jan 2025 14:40:39 GMT
- Title: CheapNVS: Real-Time On-Device Narrow-Baseline Novel View Synthesis
- Authors: Konstantinos Georgiadis, Mehmet Kerim Yucel, Albert Saa-Garriga,
- Abstract summary: Single-view novel view synthesis (NVS) is a notorious problem due to its ill-posed nature, and often requires large, computationally expensive approaches to produce tangible results.<n>We propose CheapNVS: a fully end-to-end approach for narrow baseline single-view NVS based on a novel, efficient multiple encoder/decoder design trained in a multi-stage fashion.
- Score: 2.4578723416255754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-view novel view synthesis (NVS) is a notorious problem due to its ill-posed nature, and often requires large, computationally expensive approaches to produce tangible results. In this paper, we propose CheapNVS: a fully end-to-end approach for narrow baseline single-view NVS based on a novel, efficient multiple encoder/decoder design trained in a multi-stage fashion. CheapNVS first approximates the laborious 3D image warping with lightweight learnable modules that are conditioned on the camera pose embeddings of the target view, and then performs inpainting on the occluded regions in parallel to achieve significant performance gains. Once trained on a subset of Open Images dataset, CheapNVS outperforms the state-of-the-art despite being 10 times faster and consuming 6% less memory. Furthermore, CheapNVS runs comfortably in real-time on mobile devices, reaching over 30 FPS on a Samsung Tab 9+.
Related papers
- FlowR: Flowing from Sparse to Dense 3D Reconstructions [60.6368083163258]
We propose a flow matching model that learns a flow to connect novel view renderings to renderings that we expect from dense reconstructions.
Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass.
arXiv Detail & Related papers (2025-04-02T11:57:01Z) - Stable Virtual Camera: Generative View Synthesis with Diffusion Models [51.71244310522393]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.
Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.
Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z) - TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis [22.72162881491581]
TriDF is an efficient hybrid 3D representation for fast remote sensing NVS from as few as 3 input views.
Our approach decouples color and volume density information, modeling them independently to reduce the computational burden.
Comprehensive experiments across multiple remote sensing scenes demonstrate that our hybrid representation achieves a 30x speed increase.
arXiv Detail & Related papers (2025-03-17T16:25:39Z) - NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images [50.36605863731669]
NVComposer is a novel approach that eliminates the need for explicit external alignment.<n> NVComposer achieves state-of-the-art performance in generative multi-view NVS tasks.<n>Our approach shows substantial improvements in synthesis quality as the number of unposed input views increases.
arXiv Detail & Related papers (2024-12-04T17:58:03Z) - D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis.
We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z) - Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion [62.37374499337897]
We present Dual3D, a novel text-to-3D generation framework.
It generates high-quality 3D assets from texts in only $1$ minute.
arXiv Detail & Related papers (2024-05-16T07:50:02Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel and lightning-fast neural reconstruction system that builds accurate 3D representations from as few as 2-3 images.<n>InstantSplat integrates dense stereo priors and co-visibility relationships between frames to initialize pixel-aligned by progressively expanding the scene.<n>It achieves an acceleration of over 20 times in reconstruction, improves visual quality (SSIM) from 0.3755 to 0.7624 than COLMAP with 3D-GS, and is compatible with multiple 3D representations.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - fMPI: Fast Novel View Synthesis in the Wild with Layered Scene
Representations [9.75588035624177]
We propose two novel input processing paradigms for novel view synthesis (NVS) methods.
Our approach identifies and mitigates the two most time-consuming aspects of traditional pipelines.
We demonstrate that our proposed paradigms enable the design of an NVS method that achieves state-of-the-art on public benchmarks.
arXiv Detail & Related papers (2023-12-26T16:24:08Z) - Novel View Synthesis with View-Dependent Effects from a Single Image [35.85973300177698]
We first consider view-dependent effects into single image-based novel view synthesis (NVS) problems.
We propose to exploit the camera motion priors in NVS to model view-dependent appearance or effects (VDE) as the negative disparity in the scene.
We present extensive experiment results and show that our proposed method can learn NVS with VDEs, outperforming the SOTA single-view NVS methods on the RealEstate10k and MannequinChallenge datasets.
arXiv Detail & Related papers (2023-12-13T11:29:47Z) - Free3D: Consistent Novel View Synthesis without 3D Representation [63.931920010054064]
Free3D is a simple accurate method for monocular open-set novel view synthesis (NVS)
Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation.
arXiv Detail & Related papers (2023-12-07T18:59:18Z) - Novel View Synthesis from a Single RGBD Image for Indoor Scenes [4.292698270662031]
We propose an approach for synthesizing novel view images from a single RGBD (Red Green Blue-Depth) input.
In our method, we convert an RGBD image into a point cloud and render it from a different viewpoint, then formulate the NVS task into an image translation problem.
arXiv Detail & Related papers (2023-11-02T08:34:07Z) - TOSS:High-quality Text-guided Novel View Synthesis from a Single Image [36.90122394242858]
We present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image.
To address this limitation, TOSS uses text as high-level semantic information to constrain the NVS solution space.
arXiv Detail & Related papers (2023-10-16T17:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.