PETR: Position Embedding Transformation for Multi-View 3D Object
Detection
- URL: http://arxiv.org/abs/2203.05625v1
- Date: Thu, 10 Mar 2022 20:33:28 GMT
- Title: PETR: Position Embedding Transformation for Multi-View 3D Object
Detection
- Authors: Yingfei Liu, Tiancai Wang, Xiangyu Zhang, Jian Sun
- Abstract summary: PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features.
PETR achieves state-of-the-art performance on standard nuScenes dataset and ranks 1st place on the benchmark.
- Score: 80.93664973321168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we develop position embedding transformation (PETR) for
multi-view 3D object detection. PETR encodes the position information of 3D
coordinates into image features, producing the 3D position-aware features.
Object query can perceive the 3D position-aware features and perform end-to-end
object detection. PETR achieves state-of-the-art performance (50.4% NDS and
44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark.
It can serve as a simple yet strong baseline for future research.
Related papers
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection [16.677107631803327]
PARQ is a multi-view 3D object detector with transformer and pixel-aligned recurrent queries.
It can leverage additional input views without retraining, and can adapt inference compute by changing the number of recurrent iterations.
arXiv Detail & Related papers (2023-10-02T17:58:51Z) - Transformer-based stereo-aware 3D object detection from binocular images [82.85433941479216]
We explore the model design of Transformers in binocular 3D object detection.
To achieve this goal, we present TS3D, a Stereo-aware 3D object detector.
Our proposed TS3D achieves a 41.29% Moderate Car detection average precision on the KITTI test set and takes 88 ms to detect objects from each binocular image pair.
arXiv Detail & Related papers (2023-04-24T08:29:45Z) - Viewpoint Equivariance for Multi-View 3D Object Detection [35.4090127133834]
State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input.
We introduce VEDet, a novel 3D object detection framework that exploits 3D multi-view geometry.
arXiv Detail & Related papers (2023-03-25T19:56:41Z) - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space.
We propose a novel method based on CAmera view Position Embedding, called CAPE.
CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z) - 3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection
Transformers [35.14784758217257]
We introduce 3D point positional encoding, 3DPPE, to the 3D detection Transformer decoder.
Despite the approximation, 3DPPE achieves 46.0 mAP and 51.4 NDS on the competitive nuScenes dataset.
arXiv Detail & Related papers (2022-11-27T03:36:32Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images [105.29493158036105]
PETRv2 is a unified framework for 3D perception from multi-view images.
We extend the 3D position embedding in PETR for temporal modeling.
PETRv2 achieves state-of-the-art performance on 3D object detection and BEV segmentation.
arXiv Detail & Related papers (2022-06-02T19:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.