GARNet: Global-Aware Multi-View 3D Reconstruction Network and the
Cost-Performance Tradeoff
- URL: http://arxiv.org/abs/2211.02299v1
- Date: Fri, 4 Nov 2022 07:45:19 GMT
- Title: GARNet: Global-Aware Multi-View 3D Reconstruction Network and the
Cost-Performance Tradeoff
- Authors: Zhenwei Zhu, Liying Yang, Xuxin Lin, Chaohao Jiang, Ning Li, Lin Yang,
Yanyan Liang
- Abstract summary: We propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference.
In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall.
Experiments on ShapeNet verify that our method outperforms existing SOTA methods.
- Score: 10.8606881536924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning technology has made great progress in multi-view 3D
reconstruction tasks. At present, most mainstream solutions establish the
mapping between views and shape of an object by assembling the networks of 2D
encoder and 3D decoder as the basic structure while they adopt different
approaches to obtain aggregation of features from several views. Among them,
the methods using attention-based fusion perform better and more stable than
the others, however, they still have an obvious shortcoming -- the strong
independence of each view during predicting the weights for merging leads to a
lack of adaption of the global state. In this paper, we propose a global-aware
attention-based fusion approach that builds the correlation between each branch
and the global to provide a comprehensive foundation for weights inference. In
order to enhance the ability of the network, we introduce a novel loss function
to supervise the shape overall and propose a dynamic two-stage training
strategy that can effectively adapt to all reconstructors with attention-based
fusion. Experiments on ShapeNet verify that our method outperforms existing
SOTA methods while the amount of parameters is far less than the same type of
algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on
maximizing diversity and discuss the cost-performance tradeoff of our model to
achieve a better performance when facing heavy input amount and limited
computational cost.
Related papers
- Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Human as Points: Explicit Point-based 3D Human Reconstruction from
Single-view RGB Images [78.56114271538061]
We introduce an explicit point-based human reconstruction framework called HaP.
Our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space.
Our results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design.
arXiv Detail & Related papers (2023-11-06T05:52:29Z) - Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening [13.593522290577512]
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image.
Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency.
arXiv Detail & Related papers (2023-04-28T03:34:36Z) - Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna
Tuning [60.94661435297309]
The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies.
We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally.
We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
arXiv Detail & Related papers (2023-01-20T17:06:34Z) - AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet.
We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation.
We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.