TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- URL: http://arxiv.org/abs/2107.02191v1
- Date: Mon, 5 Jul 2021 18:00:11 GMT
- Title: TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
- Authors: Alja\v{z} Bo\v{z}i\v{c}, Pablo Palafox, Justus Thies, Angela Dai,
Matthias Nie{\ss}ner
- Abstract summary: TransformerFusion is a transformer-based 3D scene reconstruction approach.
Network learns to attend to the most relevant image frames for each 3D location in the scene.
Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed.
- Score: 26.87200488085741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce TransformerFusion, a transformer-based 3D scene reconstruction
approach. From an input monocular RGB video, the video frames are processed by
a transformer network that fuses the observations into a volumetric feature
grid representing the scene; this feature grid is then decoded into an implicit
3D scene representation. Key to our approach is the transformer architecture
that enables the network to learn to attend to the most relevant image frames
for each 3D location in the scene, supervised only by the scene reconstruction
task. Features are fused in a coarse-to-fine fashion, storing fine-level
features only where needed, requiring lower memory storage and enabling fusion
at interactive rates. The feature grid is then decoded to a higher-resolution
scene reconstruction, using an MLP-based surface occupancy prediction from
interpolated coarse-to-fine 3D features. Our approach results in an accurate
surface reconstruction, outperforming state-of-the-art multi-view stereo depth
estimation methods, fully-convolutional 3D reconstruction approaches, and
approaches using LSTM- or GRU-based recurrent networks for video sequence
fusion.
Related papers
- HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction [14.000919964212857]
Vision-based 3D semantic scene completion describes autonomous driving scenes through 3D volume representations.
HybridOcc is a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation.
We present an innovative occupancy-aware ray sampling method to orient the SSC task instead of focusing on the scene surface.
arXiv Detail & Related papers (2024-08-17T05:50:51Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Structural Multiplane Image: Bridging Neural View Synthesis and 3D
Reconstruction [39.89856628467095]
We introduce the Structural MPI (S-MPI), where the plane structure approximates 3D scenes concisely.
Despite the intuition and demand of applying S-MPI, great challenges are introduced, e.g., high-fidelity approximation for both RGBA layers and plane poses.
Our method outperforms both previous state-of-the-art MPI-based view synthesis methods and planar reconstruction methods.
arXiv Detail & Related papers (2023-03-10T14:18:40Z) - VolRecon: Volume Rendering of Signed Ray Distance Functions for
Generalizable Multi-View Reconstruction [64.09702079593372]
VolRecon is a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF)
On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction.
arXiv Detail & Related papers (2022-12-15T18:59:54Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction [23.21446438011893]
VPFusionattains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context.
Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion.
We show improved multi-view feature fusion by establishing transformer-based pairwise view association.
arXiv Detail & Related papers (2022-03-14T23:30:58Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.