GARNet: Global-Aware Multi-View 3D Reconstruction Network and the
Cost-Performance Tradeoff
- URL: http://arxiv.org/abs/2211.02299v1
- Date: Fri, 4 Nov 2022 07:45:19 GMT
- Title: GARNet: Global-Aware Multi-View 3D Reconstruction Network and the
Cost-Performance Tradeoff
- Authors: Zhenwei Zhu, Liying Yang, Xuxin Lin, Chaohao Jiang, Ning Li, Lin Yang,
Yanyan Liang
- Abstract summary: We propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference.
In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall.
Experiments on ShapeNet verify that our method outperforms existing SOTA methods.
- Score: 10.8606881536924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning technology has made great progress in multi-view 3D
reconstruction tasks. At present, most mainstream solutions establish the
mapping between views and shape of an object by assembling the networks of 2D
encoder and 3D decoder as the basic structure while they adopt different
approaches to obtain aggregation of features from several views. Among them,
the methods using attention-based fusion perform better and more stable than
the others, however, they still have an obvious shortcoming -- the strong
independence of each view during predicting the weights for merging leads to a
lack of adaption of the global state. In this paper, we propose a global-aware
attention-based fusion approach that builds the correlation between each branch
and the global to provide a comprehensive foundation for weights inference. In
order to enhance the ability of the network, we introduce a novel loss function
to supervise the shape overall and propose a dynamic two-stage training
strategy that can effectively adapt to all reconstructors with attention-based
fusion. Experiments on ShapeNet verify that our method outperforms existing
SOTA methods while the amount of parameters is far less than the same type of
algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on
maximizing diversity and discuss the cost-performance tradeoff of our model to
achieve a better performance when facing heavy input amount and limited
computational cost.
Related papers
- Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism.
It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features.
The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.
We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.
GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - Semi-supervised Single-view 3D Reconstruction via Multi Shape Prior Fusion Strategy and Self-Attention [0.0]
Semi-supervised learning strategies offer an innovative approach to reduce the dependence on labeled data.
We created an innovative framework for 3D reconstruction that distinctively introduces a multi shape prior fusion strategy.
Our framework demonstrated a 3.3% performance improvement over the baseline.
arXiv Detail & Related papers (2024-11-23T02:46:16Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network for Structured Light Projection Profilometry [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
Our method can reduce fringe order ambiguity while producing high-accuracy results on self-made datasets.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images [71.91424164693422]
We introduce an explicit point-based human reconstruction framework called HaP.
Our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space.
Our results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design.
arXiv Detail & Related papers (2023-11-06T05:52:29Z) - Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening [13.593522290577512]
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image.
Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency.
arXiv Detail & Related papers (2023-04-28T03:34:36Z) - Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna
Tuning [60.94661435297309]
The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies.
We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally.
We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
arXiv Detail & Related papers (2023-01-20T17:06:34Z) - AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet.
We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation.
We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.