Related papers: GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff

GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff

URL: http://arxiv.org/abs/2211.02299v1
Date: Fri, 4 Nov 2022 07:45:19 GMT
Title: GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff
Authors: Zhenwei Zhu, Liying Yang, Xuxin Lin, Chaohao Jiang, Ning Li, Lin Yang, Yanyan Liang
Abstract summary: We propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference. In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall. Experiments on ShapeNet verify that our method outperforms existing SOTA methods.
Score: 10.8606881536924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At present, most mainstream solutions establish the mapping between views and shape of an object by assembling the networks of 2D encoder and 3D decoder as the basic structure while they adopt different approaches to obtain aggregation of features from several views. Among them, the methods using attention-based fusion perform better and more stable than the others, however, they still have an obvious shortcoming -- the strong independence of each view during predicting the weights for merging leads to a lack of adaption of the global state. In this paper, we propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference. In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall and propose a dynamic two-stage training strategy that can effectively adapt to all reconstructors with attention-based fusion. Experiments on ShapeNet verify that our method outperforms existing SOTA methods while the amount of parameters is far less than the same type of algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on maximizing diversity and discuss the cost-performance tradeoff of our model to achieve a better performance when facing heavy input amount and limited computational cost.

Related papers

CORE-ReID: Comprehensive Optimization and Refinement through Ensemble fusion in Domain Adaptation for person re-identification [0.0]
This study introduces a novel framework, "Comprehensive Optimization and Refinement through Ensemble Fusion in Domain Adaptation for Person Re-identification"<n>The framework utilizes CycleGAN to generate diverse data that harmonizes differences in image characteristics from different camera sources in the pre-training stage.<n>In the fine-tuning stage, based on a pair of teacher-student networks, the framework integrates multi-view features for multi-level clustering to derive diverse pseudo labels.
arXiv Detail & Related papers (2025-08-05T04:25:03Z)
DSFormer: A Dual-Scale Cross-Learning Transformer for Visual Place Recognition [16.386674597850778]
We propose a novel framework that integrates Dual-Scale-Former (DSFormer), a Transformer-based cross-learning module, with an innovative block clustering strategy.<n>Our approach achieves state-of-the-art performance across most benchmark datasets.
arXiv Detail & Related papers (2025-07-24T14:29:30Z)
C2D-ISR: Optimizing Attention-based Image Super-resolution from Continuous to Discrete Scales [6.700548615812325]
We propose a novel framework, textbfC2D-ISR, for optimizing attention-based image super-resolution models. Our approach is based on a two-stage training methodology and a hierarchical encoding mechanism. In addition, we generalize the hierarchical encoding mechanism with existing attention-based network structures.
arXiv Detail & Related papers (2025-03-17T21:52:18Z)
From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning [13.282416396765392]
We introduce the first generalized cross-domain few-shot (GCFS) task in 3D object detection. Our solution integrates multi-modal fusion and contrastive-enhanced prototype learning within one framework. To effectively capture domain-specific representations for each class from limited target data, we propose a contrastive-enhanced prototype learning.
arXiv Detail & Related papers (2025-03-08T17:05:21Z)
Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism. It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features. The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z)
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data. We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models. GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z)
Semi-supervised Single-view 3D Reconstruction via Multi Shape Prior Fusion Strategy and Self-Attention [0.0]
Semi-supervised learning strategies offer an innovative approach to reduce the dependence on labeled data. We created an innovative framework for 3D reconstruction that distinctively introduces a multi shape prior fusion strategy. Our framework demonstrated a 3.3% performance improvement over the baseline.
arXiv Detail & Related papers (2024-11-23T02:46:16Z)
Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches. In this paper, we propose a model-based deep unfolded method for satellite image fusion. Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z)
Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z)
Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet) AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition. Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images [78.56114271538061]
We introduce an explicit point-based human reconstruction framework called HaP. Our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space. Our results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design.
arXiv Detail & Related papers (2023-11-06T05:52:29Z)
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening [13.593522290577512]
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency.
arXiv Detail & Related papers (2023-04-28T03:34:36Z)
Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna Tuning [60.94661435297309]
The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies. We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally. We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
arXiv Detail & Related papers (2023-01-20T17:06:34Z)
AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network [8.127449025802436]
We present a novel recurrent multi-view stereo network based on long short-term memory (LSTM) with adaptive aggregation, namely AA-RMVSNet. We firstly introduce an intra-view aggregation module to adaptively extract image features by using context-aware convolution and multi-scale aggregation. We propose an inter-view cost volume aggregation module for adaptive pixel-wise view aggregation, which is able to preserve better-matched pairs among all views.
arXiv Detail & Related papers (2021-08-09T06:10:48Z)
PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving. Current approaches suffer from sparse and partial point clouds of distant and occluded objects. In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.