VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
- URL: http://arxiv.org/abs/2304.10687v1
- Date: Fri, 21 Apr 2023 00:47:05 GMT
- Title: VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
- Authors: Huiyu Gao, Wei Mao, Miaomiao Liu
- Abstract summary: We propose VisFusion, a visibility-aware online 3D scene reconstruction approach from posed monocular videos.
We aim to improve the feature fusion by explicitly inferring its visibility from a similarity matrix.
Experimental results on benchmarks show that our method can achieve superior performance with more scene details.
- Score: 24.310673998221866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose VisFusion, a visibility-aware online 3D scene reconstruction
approach from posed monocular videos. In particular, we aim to reconstruct the
scene from volumetric features. Unlike previous reconstruction methods which
aggregate features for each voxel from input views without considering its
visibility, we aim to improve the feature fusion by explicitly inferring its
visibility from a similarity matrix, computed from its projected features in
each image pair. Following previous works, our model is a coarse-to-fine
pipeline including a volume sparsification process. Different from their works
which sparsify voxels globally with a fixed occupancy threshold, we perform the
sparsification on a local feature volume along each visual ray to preserve at
least one voxel per ray for more fine details. The sparse local volume is then
fused with a global one for online reconstruction. We further propose to
predict TSDF in a coarse-to-fine manner by learning its residuals across scales
leading to better TSDF predictions. Experimental results on benchmarks show
that our method can achieve superior performance with more scene details. Code
is available at: https://github.com/huiyu-gao/VisFusion
Related papers
- SCube: Instant Large-Scale Scene Reconstruction using VoxSplats [55.383993296042526]
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold.
arXiv Detail & Related papers (2024-10-26T00:52:46Z) - Learning Neural Implicit through Volume Rendering with Attentive Depth
Fusion Priors [32.63878457242185]
We learn neural implicit representations from multi-view RGBD images through volume rendering with an attentive depth fusion prior.
Our attention mechanism works with either a one-time fused TSDF that represents a whole scene or an incrementally fused TSDF that represents a partial scene.
Our evaluations on widely used benchmarks including synthetic and real-world scans show our superiority over the latest neural implicit methods.
arXiv Detail & Related papers (2023-10-17T21:45:51Z) - VolRecon: Volume Rendering of Signed Ray Distance Functions for
Generalizable Multi-View Reconstruction [64.09702079593372]
VolRecon is a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF)
On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction.
arXiv Detail & Related papers (2022-12-15T18:59:54Z) - VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for
Analysis-by-Synthesis [62.47221232706105]
We propose VoGE, which utilizes the Gaussian reconstruction kernels as volumetric primitives.
To efficiently render via VoGE, we propose an approximate closeform solution for the volume density aggregation and a coarse-to-fine rendering strategy.
VoGE outperforms SoTA when applied to various vision tasks, e.g., object pose estimation, shape/texture fitting, and reasoning.
arXiv Detail & Related papers (2022-05-30T19:52:11Z) - VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction [23.21446438011893]
VPFusionattains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context.
Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion.
We show improved multi-view feature fusion by establishing transformer-based pairwise view association.
arXiv Detail & Related papers (2022-03-14T23:30:58Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - NeuralFusion: Online Depth Fusion in Latent Space [77.59420353185355]
We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space.
Our approach is real-time capable, handles high noise levels, and is particularly able to deal with gross outliers common for photometric stereo-based depth maps.
arXiv Detail & Related papers (2020-11-30T13:50:59Z) - Stable View Synthesis [100.86844680362196]
We present Stable View Synthesis (SVS)
Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene.
SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets.
arXiv Detail & Related papers (2020-11-14T07:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.