A Novel Patch Convolutional Neural Network for View-based 3D Model
Retrieval
- URL: http://arxiv.org/abs/2109.12299v1
- Date: Sat, 25 Sep 2021 07:18:23 GMT
- Title: A Novel Patch Convolutional Neural Network for View-based 3D Model
Retrieval
- Authors: Zan Gao, Yuxiang Shao, Weili Guan, Meng Liu, Zhiyong Cheng, Shengyong
Chen
- Abstract summary: We propose a novel patch convolutional neural network (PCNN) for view-based 3D model retrieval.
Our proposed PCNN can outperform state-of-the-art approaches, with mAP alues of 93.67%, and 96.23%, respectively.
- Score: 36.12906920608775
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, many view-based 3D model retrieval methods have been proposed and
have achieved state-of-the-art performance. Most of these methods focus on
extracting more discriminative view-level features and effectively aggregating
the multi-view images of a 3D model, but the latent relationship among these
multi-view images is not fully explored. Thus, we tackle this problem from the
perspective of exploiting the relationships between patch features to capture
long-range associations among multi-view images. To capture associations among
views, in this work, we propose a novel patch convolutional neural network
(PCNN) for view-based 3D model retrieval. Specifically, we first employ a CNN
to extract patch features of each view image separately. Secondly, a novel
neural network module named PatchConv is designed to exploit intrinsic
relationships between neighboring patches in the feature space to capture
long-range associations among multi-view images. Then, an adaptive weighted
view layer is further embedded into PCNN to automatically assign a weight to
each view according to the similarity between each view feature and the
view-pooling feature. Finally, a discrimination loss function is employed to
extract the discriminative 3D model feature, which consists of softmax loss
values generated by the fusion lassifier and the specific classifier. Extensive
experimental results on two public 3D model retrieval benchmarks, namely, the
ModelNet40, and ModelNet10, demonstrate that our proposed PCNN can outperform
state-of-the-art approaches, with mAP alues of 93.67%, and 96.23%,
respectively.
Related papers
- Beyond First Impressions: Integrating Joint Multi-modal Cues for
Comprehensive 3D Representation [72.94143731623117]
Existing methods simply align 3D representations with single-view 2D images and coarse-grained parent category text.
Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space.
We propose a multi-view joint modality modeling approach, termed JM3D, to obtain a unified representation for point cloud, text, and image.
arXiv Detail & Related papers (2023-08-06T01:11:40Z) - Generative Multiplane Neural Radiance for 3D-Aware Image Generation [102.15322193381617]
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
Our GMNR model generates 3D-aware images of 1024 X 1024 pixels with 17.6 FPS on a single V100.
arXiv Detail & Related papers (2023-04-03T17:41:20Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Neural Volumetric Object Selection [126.04480613166194]
We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF)
Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views.
arXiv Detail & Related papers (2022-05-30T08:55:20Z) - VPFusion: Joint 3D Volume and Pixel-Aligned Feature Fusion for Single
and Multi-view 3D Reconstruction [23.21446438011893]
VPFusionattains high-quality reconstruction using both - 3D feature volume to capture 3D-structure-aware context.
Existing approaches use RNN, feature pooling, or attention computed independently in each view for multi-view fusion.
We show improved multi-view feature fusion by establishing transformer-based pairwise view association.
arXiv Detail & Related papers (2022-03-14T23:30:58Z) - Implicit Neural Deformation for Multi-View Face Reconstruction [43.88676778013593]
We present a new method for 3D face reconstruction from multi-view RGB images.
Unlike previous methods which are built upon 3D morphable models, our method leverages an implicit representation to encode rich geometric features.
Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-05T07:02:53Z) - Direct Multi-view Multi-person 3D Pose Estimation [138.48139701871213]
We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images.
MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient.
arXiv Detail & Related papers (2021-11-07T13:09:20Z) - Spatio-Temporal Self-Attention Network for Video Saliency Prediction [13.873682190242365]
3D convolutional neural networks have achieved promising results for video tasks in computer vision.
We propose a novel Spatio-Temporal Self-Temporal Self-Attention 3 Network (STSANet) for video saliency prediction.
arXiv Detail & Related papers (2021-08-24T12:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.