LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction
- URL: http://arxiv.org/abs/2106.12102v1
- Date: Wed, 23 Jun 2021 00:15:08 GMT
- Title: LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction
- Authors: Farid Yagubbayli, Alessio Tonioni, Federico Tombari
- Abstract summary: Most modern deep learning-based multi-view 3D reconstruction techniques use RNNs or fusion modules to combine information from multiple images after encoding them.
We propose LegoFormer, a transformer-based model that unifies object reconstruction under a single framework and parametrizes the reconstructed occupancy grid by its decomposition factors.
- Score: 45.16128577837725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most modern deep learning-based multi-view 3D reconstruction techniques use
RNNs or fusion modules to combine information from multiple images after
encoding them. These two separate steps have loose connections and do not
consider all available information while encoding each view. We propose
LegoFormer, a transformer-based model that unifies object reconstruction under
a single framework and parametrizes the reconstructed occupancy grid by its
decomposition factors. This reformulation allows the prediction of an object as
a set of independent structures then aggregated to obtain the final
reconstruction. Experiments conducted on ShapeNet display the competitive
performance of our network with respect to the state-of-the-art methods. We
also demonstrate how the use of self-attention leads to increased
interpretability of the model output.
Related papers
- Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image.
We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space.
A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z) - Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View [5.222115919729418]
Single-view 3D reconstruction is currently approached from two dominant perspectives.
We propose a hybrid method following a divide-and-conquer strategy.
We first process the scene holistically, extracting depth and semantic information.
We then leverage a single-shot object-level method for the detailed reconstruction of individual components.
arXiv Detail & Related papers (2024-04-04T12:58:46Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction [23.21446438011893]
VPFusionattains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context.
Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion.
We show improved multi-view feature fusion by establishing transformer-based pairwise view association.
arXiv Detail & Related papers (2022-03-14T23:30:58Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem.
We propose a new framework named 3D Volume Transformer (VolT) for such a task.
Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z) - Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from
Single and Multiple Images [56.652027072552606]
We propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++.
By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image.
A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume.
arXiv Detail & Related papers (2020-06-22T13:48:09Z) - Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space.
We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z) - STD-Net: Structure-preserving and Topology-adaptive Deformation Network
for 3D Reconstruction from a Single Image [27.885717341244014]
3D reconstruction from a single view image is a long-standing prob-lem in computer vision.
In this paper, we propose a novel methodcalled STD-Net to reconstruct the 3D models utilizing the mesh representation.
Experimental results on the images from ShapeNet show that ourproposed STD-Net has better performance than other state-of-the-art methods onreconstructing 3D objects.
arXiv Detail & Related papers (2020-03-07T11:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.