An End-to-End Transformer Model for 3D Object Detection
- URL: http://arxiv.org/abs/2109.08141v1
- Date: Thu, 16 Sep 2021 17:57:37 GMT
- Title: An End-to-End Transformer Model for 3D Object Detection
- Authors: Ishan Misra, Rohit Girdhar, Armand Joulin
- Abstract summary: 3DETR is an end-to-end Transformer based object detection model for 3D point clouds.
We show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%.
- Score: 39.86969344736215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose 3DETR, an end-to-end Transformer based object detection model for
3D point clouds. Compared to existing detection methods that employ a number of
3D-specific inductive biases, 3DETR requires minimal modifications to the
vanilla Transformer block. Specifically, we find that a standard Transformer
with non-parametric queries and Fourier positional embeddings is competitive
with specialized architectures that employ libraries of 3D-specific operators
with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and
easy to implement, enabling further improvements by incorporating 3D domain
knowledge. Through extensive experiments, we show 3DETR outperforms the
well-established and highly optimized VoteNet baselines on the challenging
ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks
beyond detection, and can serve as a building block for future research.
Related papers
- EVT: Efficient View Transformation for Multi-Modal 3D Object Detection [2.9848894641223302]
We propose a novel 3D object detector via efficient view transformation (EVT)
EVT uses Adaptive Sampling and Adaptive Projection (ASAP) to generate 3D sampling points and adaptive kernels.
It is designed to effectively utilize the obtained multi-modal BEV features within the transformer decoder.
arXiv Detail & Related papers (2024-11-16T06:11:10Z) - CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer [42.68740105997167]
We introduce two frameworks for 3D object detection with minimal hand-crafted design.
Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal.
Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information.
arXiv Detail & Related papers (2024-06-12T12:40:28Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - SRCN3D: Sparse R-CNN 3D for Compact Convolutional Multi-View 3D Object
Detection and Tracking [12.285423418301683]
This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-sparse detector that incorporates sparse queries, sparse attention with box-wise sampling, and sparse prediction.
Experiments on nuScenes dataset demonstrate that SRCN3D achieves competitive performance in both 3D object detection and multi-object tracking tasks.
arXiv Detail & Related papers (2022-06-29T07:58:39Z) - Transformers in 3D Point Clouds: A Survey [27.784721081318935]
3D Transformer models have been proven to have the remarkable ability of long-range dependencies modeling.
This survey aims to provide a comprehensive overview of 3D Transformers designed for various tasks.
arXiv Detail & Related papers (2022-05-16T01:32:18Z) - Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design.
CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation.
It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.