Transformers in 3D Point Clouds: A Survey
- URL: http://arxiv.org/abs/2205.07417v1
- Date: Mon, 16 May 2022 01:32:18 GMT
- Title: Transformers in 3D Point Clouds: A Survey
- Authors: Dening Lu, Qian Xie, Mingqiang Wei, Linlin Xu, Jonathan Li
- Abstract summary: 3D Transformer models have been proven to have the remarkable ability of long-range dependencies modeling.
This survey aims to provide a comprehensive overview of 3D Transformers designed for various tasks.
- Score: 27.784721081318935
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, Transformer models have been proven to have the remarkable
ability of long-range dependencies modeling. They have achieved satisfactory
results both in Natural Language Processing (NLP) and image processing. This
significant achievement sparks great interest among researchers in 3D point
cloud processing to apply them to various 3D tasks. Due to the inherent
permutation invariance and strong global feature learning ability, 3D
Transformers are well suited for point cloud processing and analysis. They have
achieved competitive or even better performance compared to the
state-of-the-art non-Transformer algorithms. This survey aims to provide a
comprehensive overview of 3D Transformers designed for various tasks (e.g.
point cloud classification, segmentation, object detection, and so on). We
start by introducing the fundamental components of the general Transformer and
providing a brief description of its application in 2D and 3D fields. Then, we
present three different taxonomies (i.e., Transformer implementation-based
taxonomy, data representation-based taxonomy, and task-based taxonomy) for
method classification, which allows us to analyze involved methods from
multiple perspectives. Furthermore, we also conduct an investigation of 3D
self-attention mechanism variants designed for performance improvement. To
demonstrate the superiority of 3D Transformers, we compare the performance of
Transformer-based algorithms in terms of point cloud classification,
segmentation, and object detection. Finally, we point out three potential
future research directions, expecting to provide some benefit references for
the development of 3D Transformers.
Related papers
- Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
Understanding [62.502694656615496]
We present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT.
PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art.
We formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding.
arXiv Detail & Related papers (2022-08-25T17:59:29Z) - 3D Vision with Transformers: A Survey [114.86385193388439]
The success of the transformer architecture in natural language processing has triggered attention in the computer vision field.
We present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks.
We discuss transformer design in 3D vision, which allows it to process data with various 3D representations.
arXiv Detail & Related papers (2022-08-08T17:59:11Z) - On the Robustness of 3D Object Detectors [9.467525852900007]
3D scenes exhibit a lot of variations and are prone to sensor inaccuracies as well as information loss during pre-processing.
This work aims to analyze and benchmark popular point-based 3D object detectors against several data corruptions.
arXiv Detail & Related papers (2022-07-20T21:47:15Z) - 3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification.
Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z) - An End-to-End Transformer Model for 3D Object Detection [39.86969344736215]
3DETR is an end-to-end Transformer based object detection model for 3D point clouds.
We show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%.
arXiv Detail & Related papers (2021-09-16T17:57:37Z) - Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations [61.870882736758624]
We propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER)
Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via projection.
Then, we self-train a representation to capture the intrinsic 3D object representation by decoding 3D transformation parameters from the fused feature representations of multiple views before and after the transformation.
arXiv Detail & Related papers (2021-03-01T06:24:17Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks [71.55002934935473]
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations.
We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input.
arXiv Detail & Related papers (2020-06-18T13:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.