Sparse Input View Synthesis: 3D Representations and Reliable Priors
- URL: http://arxiv.org/abs/2411.13631v1
- Date: Wed, 20 Nov 2024 18:45:46 GMT
- Title: Sparse Input View Synthesis: 3D Representations and Reliable Priors
- Authors: Nagabhushan Somraj,
- Abstract summary: Novel view synthesis is a fundamental problem in computer vision and graphics.
Recent 3D representations significantly improve the quality of images rendered from novel viewpoints.
We focus on the sparse input novel view synthesis problem for both static and dynamic scenes.
- Score: 0.8158530638728501
- License:
- Abstract: Novel view synthesis refers to the problem of synthesizing novel viewpoints of a scene given the images from a few viewpoints. This is a fundamental problem in computer vision and graphics, and enables a vast variety of applications such as meta-verse, free-view watching of events, video gaming, video stabilization and video compression. Recent 3D representations such as radiance fields and multi-plane images significantly improve the quality of images rendered from novel viewpoints. However, these models require a dense sampling of input views for high quality renders. Their performance goes down significantly when only a few input views are available. In this thesis, we focus on the sparse input novel view synthesis problem for both static and dynamic scenes.
Related papers
- Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures [33.463245327698]
We present a novel volumetric prior on human faces that allows for high-fidelity expressive face modeling.
We leverage a 3D Morphable Face Model to synthesize a large training set, rendering each identity with different expressions.
We then train a conditional Neural Radiance Field prior on this synthetic dataset and, at inference time, fine-tune the model on a very sparse set of real images of a single subject.
arXiv Detail & Related papers (2024-10-01T12:24:50Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models [33.760292331843104]
Generating novel views of an object from a single image is a challenging task.
Recent methods for view synthesis based on diffusion have shown great progress.
We demonstrate a simple method, where we utilize a pre-trained video diffusion model.
arXiv Detail & Related papers (2023-12-03T06:50:15Z) - ReShader: View-Dependent Highlights for Single Image View-Synthesis [5.736642774848791]
We propose to split the view synthesis process into two independent tasks of pixel reshading and relocation.
During the reshading process, we take the single image as the input and adjust its shading based on the novel camera.
This reshaded image is then used as the input to an existing view synthesis method to relocate the pixels and produce the final novel view image.
arXiv Detail & Related papers (2023-09-19T15:23:52Z) - Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur [68.24599239479326]
We develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forces to render high-quality, view-consistent images.
Our model surpasses state-of-the-art point-based methods for novel view synthesis.
arXiv Detail & Related papers (2023-04-25T08:36:33Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Scene Representation Transformer: Geometry-Free Novel View Synthesis
Through Set-Latent Scene Representations [48.05445941939446]
A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates.
We propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area.
We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets.
arXiv Detail & Related papers (2021-11-25T16:18:56Z) - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS.
The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes.
We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z) - Neural Body: Implicit Neural Representations with Structured Latent
Codes for Novel View Synthesis of Dynamic Humans [56.63912568777483]
This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
We propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh.
Experiments on ZJU-MoCap show that our approach outperforms prior works by a large margin in terms of novel view synthesis quality.
arXiv Detail & Related papers (2020-12-31T18:55:38Z) - NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [78.5281048849446]
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes.
Our algorithm represents a scene using a fully-connected (non-convolutional) deep network.
Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.
arXiv Detail & Related papers (2020-03-19T17:57:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.