Polar Parametrization for Vision-based Surround-View 3D Detection
- URL: http://arxiv.org/abs/2206.10965v1
- Date: Wed, 22 Jun 2022 10:26:12 GMT
- Title: Polar Parametrization for Vision-based Surround-View 3D Detection
- Authors: Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Chang Huang,
Wenyu Liu
- Abstract summary: Polar Parametrization for 3D detection reformulates position parametrization, velocity decomposition, perception range, label assignment and loss function.
Based on Polar Parametrization, we propose surround-view 3D DEtection TRansformer, named PolarDETR.
- Score: 35.2870826850481
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 3D detection based on surround-view camera system is a critical technique in
autopilot. In this work, we present Polar Parametrization for 3D detection,
which reformulates position parametrization, velocity decomposition, perception
range, label assignment and loss function in polar coordinate system. Polar
Parametrization establishes explicit associations between image patterns and
prediction targets, exploiting the view symmetry of surround-view cameras as
inductive bias to ease optimization and boost performance. Based on Polar
Parametrization, we propose a surround-view 3D DEtection TRansformer, named
PolarDETR. PolarDETR achieves promising performance-speed trade-off on
different backbone configurations. Besides, PolarDETR ranks 1st on the
leaderboard of nuScenes benchmark in terms of both 3D detection and 3D tracking
at the submission time (Mar. 4th, 2022). Code will be released at
\url{https://github.com/hustvl/PolarDETR}.
Related papers
- PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View [5.0458717114406975]
We propose to employ the polar BEV representation to substitute the Cartesian BEV representation.
Experiments on nuScenes show that PolarBEVDet achieves the superior performance.
arXiv Detail & Related papers (2024-08-29T01:42:38Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection [100.02565745233247]
Current query-based methods rely on global 3D position embeddings to learn the geometric correspondence between images and 3D space.
We propose a novel method based on CAmera view Position Embedding, called CAPE.
CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset.
arXiv Detail & Related papers (2023-03-17T18:59:54Z) - OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z) - PolarFormer: Multi-camera 3D Object Detection with Polar Transformers [93.49713023975727]
3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world.
Existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis.
We propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images.
arXiv Detail & Related papers (2022-06-30T16:32:48Z) - Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for
Autonomous Driving [3.8073142980733]
We propose jointly training 3D detection and 3D tracking from only monocular videos in an end-to-end manner.
Time3D achieves 21.4% AMOTA, 13.6% AMOTP on the nuScenes 3D tracking benchmark, surpassing all published competitors.
arXiv Detail & Related papers (2022-05-30T06:41:10Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in
the Wild [49.672487902268706]
We present a framework that jointly estimates camera temporal alignment and 3D point triangulation.
We reconstruct 3D motion trajectories of human bodies in events captured by multiple unsynchronized and unsynchronized video cameras.
arXiv Detail & Related papers (2020-07-24T23:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.