APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud
Understanding
- URL: http://arxiv.org/abs/2303.17815v1
- Date: Fri, 31 Mar 2023 06:11:02 GMT
- Title: APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud
Understanding
- Authors: Hengjia Li, Tu Zheng, Zhihao Chi, Zheng Yang, Wenxiao Wang, Boxi Wu,
Binbin Lin, Deng Cai
- Abstract summary: Transformer-based networks have achieved impressive performance in 3D point cloud understanding.
To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT)
APPT is able to capture features globally throughout the entire network while focusing on local-detailed features.
- Score: 20.87092793669536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based networks have achieved impressive performance in 3D point
cloud understanding. However, most of them concentrate on aggregating local
features, but neglect to directly model global dependencies, which results in a
limited effective receptive field. Besides, how to effectively incorporate
local and global components also remains challenging. To tackle these problems,
we propose Asymmetric Parallel Point Transformer (APPT). Specifically, we
introduce Global Pivot Attention to extract global features and enlarge the
effective receptive field. Moreover, we design the Asymmetric Parallel
structure to effectively integrate local and global information. Combined with
these designs, APPT is able to capture features globally throughout the entire
network while focusing on local-detailed features. Extensive experiments show
that our method outperforms the priors and achieves state-of-the-art on several
benchmarks for 3D point cloud understanding, such as 3D semantic segmentation
on S3DIS, 3D shape classification on ModelNet40, and 3D part segmentation on
ShapeNet.
Related papers
- Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms.
PointACL is an attention-driven contrastive learning framework designed to address these limitations.
Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Full Point Encoding for Local Feature Aggregation in 3D Point Clouds [29.402585297221457]
We propose full point encoding which is applicable to convolution and transformer architectures.
The key idea is to adaptively learn the weights from local and global geometric connections.
We achieve state-of-the-art semantic segmentation results of 76% mIoU on S3DIS 6-fold and 72.2% on S3DIS Area5.
arXiv Detail & Related papers (2023-03-08T09:14:17Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Bidirectional Feature Globalization for Few-shot Semantic Segmentation
of 3D Point Cloud Scenes [1.8374319565577157]
We propose a bidirectional feature globalization (BFG) approach to embed global perception to local point features.
With prototype-to-point globalization (Pr2PoG), the global perception is embedded to local point features based on similarity weights from sparse prototypes to dense point features.
The sparse prototypes of each class embedded with global perception are summarized to a single prototype for few-shot 3D segmentation.
arXiv Detail & Related papers (2022-08-13T15:04:20Z) - CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data.
CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers.
Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud
Segmentation [0.9137554315375922]
We propose a inception based deep network architecture called PIG-Net, that effectively characterizes the local and global geometric details of the point clouds.
We perform an exhaustive experimental analysis of the PIG-Net architecture on two state-of-the-art datasets.
arXiv Detail & Related papers (2021-01-28T13:27:55Z) - 3D Object Detection with Pointformer [29.935891419574602]
We propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.
A Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level.
A Global Transformer is designed to learn context-aware representations at the scene level.
arXiv Detail & Related papers (2020-12-21T15:12:54Z) - DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF
Relocalization [56.15308829924527]
We propose a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points.
For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner.
Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration.
arXiv Detail & Related papers (2020-07-17T20:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.