MANet: Multimodal Attention Network based Point- View fusion for 3D
Shape Recognition
- URL: http://arxiv.org/abs/2002.12573v1
- Date: Fri, 28 Feb 2020 07:00:14 GMT
- Title: MANet: Multimodal Attention Network based Point- View fusion for 3D
Shape Recognition
- Authors: Yaxin Zhao, Jichao Jiao and Tangkun Zhang
- Abstract summary: This paper proposes a fusion network based on multimodal attention mechanism for 3D shape recognition.
Considering the limitations of multi-view data, we introduce a soft attention scheme, which can use the global point-cloud features to filter the multi-view features.
More specifically, we obtain the enhanced multi-view features by mining the contribution of each multi-view image to the overall shape recognition.
- Score: 0.5371337604556311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D shape recognition has attracted more and more attention as a task of 3D
vision research. The proliferation of 3D data encourages various deep learning
methods based on 3D data. Now there have been many deep learning models based
on point-cloud data or multi-view data alone. However, in the era of big data,
integrating data of two different modals to obtain a unified 3D shape
descriptor is bound to improve the recognition accuracy. Therefore, this paper
proposes a fusion network based on multimodal attention mechanism for 3D shape
recognition. Considering the limitations of multi-view data, we introduce a
soft attention scheme, which can use the global point-cloud features to filter
the multi-view features, and then realize the effective fusion of the two
features. More specifically, we obtain the enhanced multi-view features by
mining the contribution of each multi-view image to the overall shape
recognition, and then fuse the point-cloud features and the enhanced multi-view
features to obtain a more discriminative 3D shape descriptor. We have performed
relevant experiments on the ModelNet40 dataset, and experimental results verify
the effectiveness of our method.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Deep Models for Multi-View 3D Object Recognition: A Review [16.500711021549947]
Multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance.
This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks.
arXiv Detail & Related papers (2024-04-23T16:54:31Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D
Point Cloud Understanding [4.220064723125481]
Multi-view 2D information can provide superior self-supervised signals for 3D objects.
MM-Point is driven by intra-modal and inter-modal similarity objectives.
It achieves a peak accuracy of 92.4% on the synthetic dataset ModelNet40, and a top accuracy of 87.8% on the real-world dataset ScanObjectNN.
arXiv Detail & Related papers (2024-02-15T15:10:17Z) - SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and
Multi-View for 3D Object Retrieval [8.74845857766369]
Multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets.
We propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval.
arXiv Detail & Related papers (2023-07-20T05:46:32Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.