SEFormer: Structure Embedding Transformer for 3D Object Detection
- URL: http://arxiv.org/abs/2209.01745v1
- Date: Mon, 5 Sep 2022 03:38:12 GMT
- Title: SEFormer: Structure Embedding Transformer for 3D Object Detection
- Authors: Xiaoyu Feng, Heming Du, Yueqi Duan, Yongpan Liu, Hehe Fan
- Abstract summary: Structure-Embedding transFormer (SEFormer) can preserve local structure as traditional Transformer but also have the ability to encode the local structure.
SEFormer achieves 79.02% mAP, which is 1.2% higher than existing works.
- Score: 22.88983416605276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effectively preserving and encoding structure features from objects in
irregular and sparse LiDAR points is a key challenge to 3D object detection on
point cloud. Recently, Transformer has demonstrated promising performance on
many 2D and even 3D vision tasks. Compared with the fixed and rigid convolution
kernels, the self-attention mechanism in Transformer can adaptively exclude the
unrelated or noisy points and thus suitable for preserving the local spatial
structure in irregular LiDAR point cloud. However, Transformer only performs a
simple sum on the point features, based on the self-attention mechanism, and
all the points share the same transformation for value. Such isotropic
operation lacks the ability to capture the direction-distance-oriented local
structure which is important for 3D object detection. In this work, we propose
a Structure-Embedding transFormer (SEFormer), which can not only preserve local
structure as traditional Transformer but also have the ability to encode the
local structure. Compared to the self-attention mechanism in traditional
Transformer, SEFormer learns different feature transformations for value points
based on the relative directions and distances to the query point. Then we
propose a SEFormer based network for high-performance 3D object detection.
Extensive experiments show that the proposed architecture can achieve SOTA
results on Waymo Open Dataset, the largest 3D detection benchmark for
autonomous driving. Specifically, SEFormer achieves 79.02% mAP, which is 1.2%
higher than existing works. We will release the codes.
Related papers
- TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training [21.56675189346088]
We introduce Transformation-Invariant Local (TraIL) features and the associated TraIL-Det architecture.
TraIL features exhibit rigid transformation invariance and effectively adapt to variations in point density.
They utilize the inherent isotropic radiation of LiDAR to enhance local representation.
Our method outperforms contemporary self-supervised 3D object detection approaches in terms of mAP on KITTI.
arXiv Detail & Related papers (2024-08-25T17:59:17Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - ConDaFormer: Disassembled Transformer with Local Structure Enhancement
for 3D Point Cloud Understanding [105.98609765389895]
Transformers have been recently explored for 3D point cloud understanding.
A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data.
In this paper, we develop a new transformer block, named ConDaFormer.
arXiv Detail & Related papers (2023-12-18T11:19:45Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Transformation-Equivariant 3D Object Detection for Autonomous Driving [44.17100476968737]
Transformation-Equivariant 3D Detector (TED) is an efficient way to detect 3D objects in autonomous driving.
TED ranks 1st among all submissions on KITTI 3D car detection leaderboard.
arXiv Detail & Related papers (2022-11-22T02:51:56Z) - SWFormer: Sparse Window Transformer for 3D Object Detection in Point
Clouds [44.635939022626744]
3D object detection in point clouds is a core component for modern robotics and autonomous driving systems.
Key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene.
We propose Sparse Window Transformer (SWFormer), a scalable and accurate model for 3D object detection.
arXiv Detail & Related papers (2022-10-13T21:37:53Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.