3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud
Classification and Segmentation
- URL: http://arxiv.org/abs/2209.11255v2
- Date: Wed, 31 May 2023 02:20:58 GMT
- Title: 3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud
Classification and Segmentation
- Authors: Dening Lu, Kyle Gao, Qian Xie, Linlin Xu, Jonathan Li
- Abstract summary: This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN)
The proposed framework is evaluated on both classification and segmentation datasets.
- Score: 21.054928631088575
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although the application of Transformers in 3D point cloud processing has
achieved significant progress and success, it is still challenging for existing
3D Transformer methods to efficiently and accurately learn both valuable global
features and valuable local features for improved applications. This paper
presents a novel point cloud representational learning network, called 3D Dual
Self-attention Global Local (GLocal) Transformer Network (3DGTN), for improved
feature learning in both classification and segmentation tasks, with the
following key contributions. First, a GLocal Feature Learning (GFL) block with
the dual self-attention mechanism (i.e., a novel Point-Patch Self-Attention,
called PPSA, and a channel-wise self-attention) is designed to efficiently
learn the GLocal context information. Second, the GFL block is integrated with
a multi-scale Graph Convolution-based Local Feature Aggregation (LFA) block,
leading to a Global-Local (GLocal) information extraction module that can
efficiently capture critical information. Third, a series of GLocal modules are
used to construct a new hierarchical encoder-decoder structure to enable the
learning of "GLocal" information in different scales in a hierarchical manner.
The proposed framework is evaluated on both classification and segmentation
datasets, demonstrating that the proposed method is capable of outperforming
many state-of-the-art methods on both classification and segmentation tasks.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic
Segmentation [0.0]
3D Semantic Pothole research often overlooks point cloud sparsity, leading to suboptimal local feature capture and segmentation accuracy.
Our model efficiently identifies hidden features and uses a feedback mechanism to enhance local characteristics.
Our approach offers a promising solution for robust and accurate 3D pothole segmentation, with applications in road maintenance and safety.
arXiv Detail & Related papers (2023-11-05T12:57:05Z) - PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point
Cloud Understanding [0.875967561330372]
Cross-modal method to model multi-scale correspondences across modalities for self-supervised point cloud representation learning.
PointCMC is composed of: (1) a local-to-local (L2L) module that learns local correspondences through optimized cross-modal local geometric features, (2) a local-to-global (L2G) module that aims to learn the correspondences between local and global features across modalities via local-global discrimination, and (3) a global-to-global (G2G) module, which leverages auxiliary global contrastive loss between the point cloud and image to learn high-level semantic correspondences.
arXiv Detail & Related papers (2022-11-22T06:08:43Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - LACV-Net: Semantic Segmentation of Large-Scale Point Cloud Scene via
Local Adaptive and Comprehensive VLAD [13.907586081922345]
We propose an end-to-end deep neural network called LACV-Net for large-scale point cloud semantic segmentation.
The proposed network contains three main components: 1) a local adaptive feature augmentation module (LAFA) to adaptively learn the similarity of centroids and neighboring points to augment the local context; 2) a comprehensive VLAD module that fuses local features with multi-layer, multi-scale, and multi-resolution to represent a comprehensive global description vector; and 3) an aggregation loss function to effectively optimize the segmentation boundaries by constraining the adaptive weight from the LAFA module.
arXiv Detail & Related papers (2022-10-12T02:11:00Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly
Supervised Semantic Segmentation [67.26984058377435]
We present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining.
Our framework conducts the global network to learn the captured rich object detail knowledge from a global view.
Experiments show that our method attains 72.1% and 44.2% mIoU scores on the validation set of PASCAL VOC 2012 and MS COCO 2014.
arXiv Detail & Related papers (2022-04-07T04:31:32Z) - 3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification.
Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z) - PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud
Segmentation [0.9137554315375922]
We propose a inception based deep network architecture called PIG-Net, that effectively characterizes the local and global geometric details of the point clouds.
We perform an exhaustive experimental analysis of the PIG-Net architecture on two state-of-the-art datasets.
arXiv Detail & Related papers (2021-01-28T13:27:55Z) - DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF
Relocalization [56.15308829924527]
We propose a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points.
For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner.
Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration.
arXiv Detail & Related papers (2020-07-17T20:21:22Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.