VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction
- URL: http://arxiv.org/abs/2203.07553v1
- Date: Mon, 14 Mar 2022 23:30:58 GMT
- Title: VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction
- Authors: Jisan Mahmud, Jan-Michael Frahm
- Abstract summary: VPFusionattains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context.
Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion.
We show improved multi-view feature fusion by establishing transformer-based pairwise view association.
- Score: 23.21446438011893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a unified single and multi-view neural implicit 3D
reconstruction framework VPFusion. VPFusion~attains high-quality reconstruction
using both - 3D feature volume to capture 3D-structure-aware context, and
pixel-aligned image features to capture fine local detail. Existing approaches
use RNN, feature pooling, or attention computed independently in each view for
multi-view fusion. RNNs suffer from long-term memory loss and permutation
variance, while feature pooling or independently computed attention leads to
representation in each view being unaware of other views before the final
pooling step. In contrast, we show improved multi-view feature fusion by
establishing transformer-based pairwise view association. In particular, we
propose a novel interleaved 3D reasoning and pairwise view association
architecture for feature volume fusion across different views. Using this
structure-aware and multi-view-aware feature volume, we show improved 3D
reconstruction performance compared to existing methods. VPFusion improves the
reconstruction quality further by also incorporating pixel-aligned local image
features to capture fine detail. We verify the effectiveness of VPFusion~on the
ShapeNet and ModelNet datasets, where we outperform or perform on-par the
state-of-the-art single and multi-view 3D shape reconstruction methods.
Related papers
- MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - From 2D Images to 3D Model:Weakly Supervised Multi-View Face
Reconstruction with Deep Fusion [26.011557635884568]
We propose a novel model called Deep Fusion MVR to reconstruct high-precision 3D facial shapes from multi-view images.
Specifically, we introduce MulEn-Unet, a multi-view encoding to single decoding framework with skip connections and attention.
We develop the face parse network to learn, identify, and emphasize the critical common face area within multi-view images.
arXiv Detail & Related papers (2022-04-08T05:11:04Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - A Novel Patch Convolutional Neural Network for View-based 3D Model
Retrieval [36.12906920608775]
We propose a novel patch convolutional neural network (PCNN) for view-based 3D model retrieval.
Our proposed PCNN can outperform state-of-the-art approaches, with mAP alues of 93.67%, and 96.23%, respectively.
arXiv Detail & Related papers (2021-09-25T07:18:23Z) - Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem.
We propose a new framework named 3D Volume Transformer (VolT) for such a task.
Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z) - Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images [56.652027072552606]
We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
arXiv Detail & Related papers (2020-06-22T13:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.