R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction
- URL: http://arxiv.org/abs/2312.02725v3
- Date: Wed, 6 Mar 2024 12:48:33 GMT
- Title: R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction
- Authors: Chenhuan Li, Meihua Xiao, zehuan li and Fangping Chen, Shanshan Qiao,
Dingli Wang, Mengxi Gao, Siyi Zhang
- Abstract summary: We propose a voxel 3D reconstruction network based on shifted window attention.
Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.
- Score: 0.565395466029518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, vision transformers have performed well in various computer vision
tasks, including voxel 3D reconstruction. However, the windows of the vision
transformer are not multi-scale, and there is no connection between the
windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we
propose a voxel 3D reconstruction network based on shifted window attention. To
the best of our knowledge, this is the first work to apply shifted window
attention to voxel 3D reconstruction. Experimental results on ShapeNet verify
our method achieves SOTA accuracy in single-view reconstruction.
Related papers
- FineRecon: Depth-aware Feed-forward Network for Detailed 3D
Reconstruction [13.157400338544177]
Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry is feasible using deep neural networks.
We propose three effective solutions for improving the fidelity of inference-based 3D reconstructions.
Our method, FineRecon, produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.
arXiv Detail & Related papers (2023-04-04T02:50:29Z) - MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation.
We introduce a novel multi-view RGBD dataset captured using a mobile device.
We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - 3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes [0.0]
We propose a novel framework that leverages the intermediate latent spaces of Vision Transformer (ViT) and a joint image-text representational model, CLIP, for fast and efficient Single View Reconstruction (SVR)
We use the ShapeNetV2 dataset and perform extensive experiments with comparisons to SOTA methods to demonstrate our method's effectiveness.
arXiv Detail & Related papers (2022-12-05T11:45:26Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer? [111.11502241431286]
Vision Transformers (ViTs) have proven to be effective in solving 2D image understanding tasks.
ViTs for 2D and 3D tasks have so far adopted vastly different architecture designs that are hardly transferable.
This paper demonstrates the appealing promise to understand the 3D visual world, using a standard 2D ViT architecture.
arXiv Detail & Related papers (2022-09-15T03:34:58Z) - Monocular 3D Object Reconstruction with GAN Inversion [122.96094885939146]
MeshInversion is a novel framework to improve the reconstruction of textured 3D meshes.
It exploits the generative prior of a 3D GAN pre-trained for 3D textured mesh synthesis.
Our framework obtains faithful 3D reconstructions with consistent geometry and texture across both observed and unobserved parts.
arXiv Detail & Related papers (2022-07-20T17:47:22Z) - Voxel-based 3D Detection and Reconstruction of Multiple Objects from a
Single Image [22.037472446683765]
We learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator.
Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space.
We devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation.
arXiv Detail & Related papers (2021-11-04T18:30:37Z) - 3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with
Transformers [12.238921770499912]
3D-RETR is able to perform end-to-end 3D REconstruction with TRansformers.
3D-RETR first uses a pretrained Transformer to extract visual features from 2D input images.
A CNN Decoder then takes as input the voxel features to obtain the reconstructed objects.
arXiv Detail & Related papers (2021-10-17T16:19:15Z) - Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction [57.805334118057665]
We propose REFINE, a postprocessing mesh refinement step that can be easily integrated into the pipeline of any black-box method in the literature.
At test time, REFINE optimize a network per mesh instance, to encourage consistency between the mesh and the given object view.
arXiv Detail & Related papers (2021-08-23T03:28:47Z) - D-OccNet: Detailed 3D Reconstruction Using Cross-Domain Learning [0.0]
We extend the work on Occupancy Networks by exploiting cross-domain learning of image and point cloud domains.
Our network, the Double Occupancy Network (D-OccNet) outperforms Occupancy Networks in terms of visual quality and details captured in the 3D reconstruction.
arXiv Detail & Related papers (2021-04-28T16:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.