3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection
Transformers
- URL: http://arxiv.org/abs/2211.14710v3
- Date: Fri, 28 Jul 2023 02:31:31 GMT
- Title: 3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection
Transformers
- Authors: Changyong Shu, JIajun Deng, Fisher Yu and Yifan Liu
- Abstract summary: We introduce 3D point positional encoding, 3DPPE, to the 3D detection Transformer decoder.
Despite the approximation, 3DPPE achieves 46.0 mAP and 51.4 NDS on the competitive nuScenes dataset.
- Score: 35.14784758217257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based methods have swept the benchmarks on 2D and 3D detection on
images. Because tokenization before the attention mechanism drops the spatial
information, positional encoding becomes critical for those methods. Recent
works found that encodings based on samples of the 3D viewing rays can
significantly improve the quality of multi-camera 3D object detection. We
hypothesize that 3D point locations can provide more information than rays.
Therefore, we introduce 3D point positional encoding, 3DPPE, to the 3D
detection Transformer decoder. Although 3D measurements are not available at
the inference time of monocular 3D object detection, 3DPPE uses predicted depth
to approximate the real point positions. Our hybriddepth module combines direct
and categorical depth to estimate the refined depth of each pixel. Despite the
approximation, 3DPPE achieves 46.0 mAP and 51.4 NDS on the competitive nuScenes
dataset, significantly outperforming encodings based on ray samples. We make
the codes available at https://github.com/drilistbox/3DPPE.
Related papers
- MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection.
Previous method relies on NeRF for geometry reasoning.
We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z) - PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection [13.60524473223155]
This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects.
PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space.
Our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner.
arXiv Detail & Related papers (2024-10-01T01:40:22Z) - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - Viewpoint Equivariance for Multi-View 3D Object Detection [35.4090127133834]
State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input.
We introduce VEDet, a novel 3D object detection framework that exploits 3D multi-view geometry.
arXiv Detail & Related papers (2023-03-25T19:56:41Z) - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space.
We propose a novel method based on CAmera view Position Embedding, called CAPE.
CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z) - PETR: Position Embedding Transformation for Multi-View 3D Object
Detection [80.93664973321168]
PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features.
PETR achieves state-of-the-art performance on standard nuScenes dataset and ranks 1st place on the benchmark.
arXiv Detail & Related papers (2022-03-10T20:33:28Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection [62.34374949726333]
Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
arXiv Detail & Related papers (2020-04-07T02:18:38Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.