Bridging Implicit and Explicit Geometric Transformation for Single-Image View Synthesis
- URL: http://arxiv.org/abs/2209.07105v3
- Date: Fri, 15 Mar 2024 08:21:04 GMT
- Title: Bridging Implicit and Explicit Geometric Transformation for Single-Image View Synthesis
- Authors: Byeongjun Park, Hyojun Go, Changick Kim,
- Abstract summary: "seesaw" problem: preserving reprojected contents and completing realistic out-of-view regions.
We propose a single-image view synthesis framework for mitigating the seesaw problem while utilizing an efficient non-autoregressive model.
Our loss function promotes synthesizing that explicit features improve the reprojected area of implicit features and implicit features improve the out-of-view area of explicit features.
- Score: 16.14528024065244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating novel views from a single image has achieved tremendous strides with advanced autoregressive models, as unseen regions have to be inferred from the visible scene contents. Although recent methods generate high-quality novel views, synthesizing with only one explicit or implicit 3D geometry has a trade-off between two objectives that we call the "seesaw" problem: 1) preserving reprojected contents and 2) completing realistic out-of-view regions. Also, autoregressive models require a considerable computational cost. In this paper, we propose a single-image view synthesis framework for mitigating the seesaw problem while utilizing an efficient non-autoregressive model. Motivated by the characteristics that explicit methods well preserve reprojected pixels and implicit methods complete realistic out-of-view regions, we introduce a loss function to complement two renderers. Our loss function promotes that explicit features improve the reprojected area of implicit features and implicit features improve the out-of-view area of explicit features. With the proposed architecture and loss function, we can alleviate the seesaw problem, outperforming autoregressive-based state-of-the-art methods and generating an image $\approx$100 times faster. We validate the efficiency and effectiveness of our method with experiments on RealEstate10K and ACID datasets.
Related papers
- Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting [4.89907242398523]
We propose renderability field-guided gaussian splatting (RF-GS) for scene view synthesis.
RF-GS quantifies input inhomogeneity through a renderability field, guiding pseudo-view sampling to enhanced visual consistency.
Our experiments on simulated and real-world data show that our method outperforms existing approaches in rendering stability.
arXiv Detail & Related papers (2025-04-27T14:41:01Z) - Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views [29.85363432402896]
We propose a novel neural rendering framework to accomplish the unposed and extremely sparse-view 3D reconstruction in unbounded 360deg scenes.
By employing a dense stereo reconstruction model to recover coarse geometry, we introduce a layer-specific bootstrap optimization to refine the noise and fill occluded regions in the reconstruction.
Our approach outperforms existing state-of-the-art methods in terms of rendering quality and surface reconstruction accuracy.
arXiv Detail & Related papers (2025-03-31T17:59:25Z) - RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors [13.883695200241524]
RI3D is a novel approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images.
Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions.
We produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes.
arXiv Detail & Related papers (2025-03-13T20:16:58Z) - Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training [102.82553402539139]
Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image.
These models often face challenges in maintaining consistency across novel and reference views.
We propose to use epipolar geometry to locate and retrieve overlapping information from the input view.
This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning.
arXiv Detail & Related papers (2025-02-25T14:04:22Z) - GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views [28.47730275628715]
We propose a generalizable neural rendering method that can perform high-fidelity novel view synthesis under several degradations.
Our method, GAURA, is learning-based and does not require any test-time scene-specific optimization.
arXiv Detail & Related papers (2024-07-11T06:44:37Z) - Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering [16.382098950820822]
We propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps.
We modify the self-attention mechanism to integrate information from the source view, reducing shape distortions.
Results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.
arXiv Detail & Related papers (2024-05-29T00:58:22Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real
Image Animation [66.0838349951456]
Nerf-based Generative models have shown impressive capacity in generating high-quality images with consistent 3D geometry.
We propose a universal method to surgically fine-tune these NeRF-GAN models in order to achieve high-fidelity animation of real subjects only by a single image.
arXiv Detail & Related papers (2022-11-30T18:36:45Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Solving Inverse Problems with NerfGANs [88.24518907451868]
We introduce a novel framework for solving inverse problems using NeRF-style generative models.
We show that naively optimizing the latent space leads to artifacts and poor novel view rendering.
We propose a novel radiance field regularization method to obtain better 3-D surfaces and improved novel views given single view observations.
arXiv Detail & Related papers (2021-12-16T17:56:58Z) - Inverting Generative Adversarial Renderer for Face Reconstruction [58.45125455811038]
In this work, we introduce a novel Generative Adversa Renderer (GAR)
GAR learns to model the complicated real-world image, instead of relying on the graphics rules, it is capable of producing realistic images.
Our method achieves state-of-the-art performances on multiple face reconstruction.
arXiv Detail & Related papers (2021-05-06T04:16:06Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.