ConDaFormer: Disassembled Transformer with Local Structure Enhancement
for 3D Point Cloud Understanding
- URL: http://arxiv.org/abs/2312.11112v1
- Date: Mon, 18 Dec 2023 11:19:45 GMT
- Title: ConDaFormer: Disassembled Transformer with Local Structure Enhancement
for 3D Point Cloud Understanding
- Authors: Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Gui-Song Xia,
Dacheng Tao
- Abstract summary: Transformers have been recently explored for 3D point cloud understanding.
A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data.
In this paper, we develop a new transformer block, named ConDaFormer.
- Score: 105.98609765389895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have been recently explored for 3D point cloud understanding
with impressive progress achieved. A large number of points, over 0.1 million,
make the global self-attention infeasible for point cloud data. Thus, most
methods propose to apply the transformer in a local region, e.g., spherical or
cubic window. However, it still contains a large number of Query-Key pairs,
which requires high computational costs. In addition, previous methods usually
learn the query, key, and value using a linear projection without modeling the
local 3D geometric structure. In this paper, we attempt to reduce the costs and
model the local geometry prior by developing a new transformer block, named
ConDaFormer. Technically, ConDaFormer disassembles the cubic window into three
orthogonal 2D planes, leading to fewer points when modeling the attention in a
similar range. The disassembling operation is beneficial to enlarging the range
of attention without increasing the computational complexity, but ignores some
contexts. To provide a remedy, we develop a local structure enhancement
strategy that introduces a depth-wise convolution before and after the
attention. This scheme can also capture the local geometric information. Taking
advantage of these designs, ConDaFormer captures both long-range contextual
information and local priors. The effectiveness is demonstrated by experimental
results on several 3D point cloud understanding benchmarks. Code is available
at https://github.com/LHDuan/ConDaFormer .
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Monocular Scene Reconstruction with 3D SDF Transformers [17.565474518578178]
We propose an SDF transformer network, which replaces the role of 3D CNN for better 3D feature aggregation.
Experiments on multiple datasets show that this 3D transformer network generates a more accurate and complete reconstruction.
arXiv Detail & Related papers (2023-01-31T09:54:20Z) - SEFormer: Structure Embedding Transformer for 3D Object Detection [22.88983416605276]
Structure-Embedding transFormer (SEFormer) can preserve local structure as traditional Transformer but also have the ability to encode the local structure.
SEFormer achieves 79.02% mAP, which is 1.2% higher than existing works.
arXiv Detail & Related papers (2022-09-05T03:38:12Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions.
In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder.
Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z) - Local Implicit Grid Representations for 3D Scenes [24.331110387905962]
We introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality.
We train an autoencoder to learn an embedding of local crops of 3D shapes at that size.
Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops.
arXiv Detail & Related papers (2020-03-19T18:58:13Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.