Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images
- URL: http://arxiv.org/abs/2006.12250v2
- Date: Tue, 7 Jul 2020 09:30:13 GMT
- Title: Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images
- Authors: Haozhe Xie, Hongxun Yao, Shengping Zhang, Shangchen Zhou, Wenxiu Sun
- Abstract summary: We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
- Score: 56.652027072552606
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recovering the 3D shape of an object from single or multiple images with deep
neural networks has been attracting increasing attention in the past few years.
Mainstream works (e.g. 3D-R2N2) use recurrent neural networks (RNNs) to
sequentially fuse feature maps of input images. However, RNN-based approaches
are unable to produce consistent reconstruction results when given the same
input images with different orders. Moreover, RNNs may forget important
features from early input images due to long-term memory loss. To address these
issues, we propose a novel framework for single-view and multi-view 3D object
reconstruction, named Pix2Vox++. By using a well-designed encoder-decoder, it
generates a coarse 3D volume from each input image. A multi-scale context-aware
fusion module is then introduced to adaptively select high-quality
reconstructions for different parts from all coarse 3D volumes to obtain a
fused 3D volume. To further correct the wrongly recovered parts in the fused 3D
volume, a refiner is adopted to generate the final output. Experimental results
on the ShapeNet, Pix3D, and Things3D benchmarks show that Pix2Vox++ performs
favorably against state-of-the-art methods in terms of both accuracy and
efficiency.
Related papers
- MV2Cyl: Reconstructing 3D Extrusion Cylinders from Multi-View Images [13.255044855902408]
We present MV2Cyl, a novel method for reconstructing 3D from 2D multi-view images.
We achieve the optimal reconstruction result with the best accuracy in 2D sketch and extrude parameter estimation.
arXiv Detail & Related papers (2024-06-16T08:54:38Z) - IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality
3D Generation [96.32684334038278]
In this paper, we explore the design space of text-to-3D models.
We significantly improve multi-view generation by considering video instead of image generators.
Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x.
arXiv Detail & Related papers (2024-02-13T18:59:51Z) - Free3D: Consistent Novel View Synthesis without 3D Representation [63.931920010054064]
Free3D is a simple accurate method for monocular open-set novel view synthesis (NVS)
Compared to other works that took a similar approach, we obtain significant improvements without resorting to an explicit 3D representation.
arXiv Detail & Related papers (2023-12-07T18:59:18Z) - DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model [86.37536249046943]
textbfDMV3D is a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.
Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering.
arXiv Detail & Related papers (2023-11-15T18:58:41Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input.
We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry.
We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z) - Atlas: End-to-End 3D Scene Reconstruction from Posed Images [13.154808583020229]
We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images.
A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume.
A 3D CNN refines the accumulated features and predicts the TSDF values.
arXiv Detail & Related papers (2020-03-23T17:59:15Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.