FaDIV-Syn: Fast Depth-Independent View Synthesis
- URL: http://arxiv.org/abs/2106.13139v1
- Date: Thu, 24 Jun 2021 16:14:01 GMT
- Title: FaDIV-Syn: Fast Depth-Independent View Synthesis
- Authors: Andre Rochow, Max Schwarz, Michael Weinmann, Sven Behnke
- Abstract summary: We introduce FaDIV-Syn, a fast depth-independent view synthesis method.
Our multi-view approach addresses the problem that view synthesis methods are often limited by their depth estimation stage.
- Score: 27.468361999226886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce FaDIV-Syn, a fast depth-independent view synthesis method. Our
multi-view approach addresses the problem that view synthesis methods are often
limited by their depth estimation stage, where incorrect depth predictions can
lead to large projection errors. To avoid this issue, we efficiently warp
multiple input images into the target frame for a range of assumed depth
planes. The resulting tensor representation is fed into a U-Net-like CNN with
gated convolutions, which directly produces the novel output view. We therefore
side-step explicit depth estimation. This improves efficiency and performance
on transparent, reflective, and feature-less scene parts. FaDIV-Syn can handle
both interpolation and extrapolation tasks and outperforms state-of-the-art
extrapolation methods on the large-scale RealEstate10k dataset. In contrast to
comparable methods, it is capable of real-time operation due to its lightweight
architecture. We further demonstrate data efficiency of FaDIV-Syn by training
from fewer examples as well as its generalization to higher resolutions and
arbitrary depth ranges under severe depth discretization.
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Efficient Depth-Guided Urban View Synthesis [52.841803876653465]
We introduce Efficient Depth-Guided Urban View Synthesis (EDUS) for fast feed-forward inference and efficient per-scene fine-tuning.
EDUS exploits noisy predicted geometric priors as guidance to enable generalizable urban view synthesis from sparse input images.
Our results indicate that EDUS achieves state-of-the-art performance in sparse view settings when combined with fast test-time optimization.
arXiv Detail & Related papers (2024-07-17T08:16:25Z) - Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics.
We use quadric assumption to rectify noisy depth estimations from RGB inputs.
We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z) - Metrically Scaled Monocular Depth Estimation through Sparse Priors for
Underwater Robots [0.0]
We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions.
The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea.
The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core.
arXiv Detail & Related papers (2023-10-25T16:32:31Z) - DiffusionDepth: Diffusion Denoising Approach for Monocular Depth
Estimation [23.22005119986485]
DiffusionDepth is a new approach that reformulates monocular depth estimation as a denoising diffusion process.
It learns an iterative denoising process to denoise' random depth distribution into a depth map with the guidance of monocular visual conditions.
Experimental results on KITTI and NYU-Depth-V2 datasets suggest that a simple yet efficient diffusion approach could reach state-of-the-art performance in both indoor and outdoor scenarios with acceptable inference time.
arXiv Detail & Related papers (2023-03-09T03:48:24Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein
GANs [1.0499611180329802]
Real-time estimation of actual environment depth is an essential module for various autonomous system tasks.
In this study, latest advancements in the field of generative neural networks are leveraged to fully unsupervised single-image depth synthesis.
arXiv Detail & Related papers (2021-03-31T09:43:38Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Fast Depth Estimation for View Synthesis [9.243157709083672]
Disparity/depth estimation from sequences of stereo images is an important element in 3D vision.
We propose a novel learning-based framework making use of dilated convolution, densely connected convolutional modules, compact decoder and skip connections.
We show that our network outperforms state-of-the-art methods with an average improvement in depth estimation and view synthesis by approximately 45% and 34% respectively.
arXiv Detail & Related papers (2020-03-14T14:10:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.