PolarFormer: Multi-camera 3D Object Detection with Polar Transformers
- URL: http://arxiv.org/abs/2206.15398v2
- Date: Fri, 1 Jul 2022 09:27:56 GMT
- Title: PolarFormer: Multi-camera 3D Object Detection with Polar Transformers
- Authors: Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming
Hu, Yu-Gang Jiang
- Abstract summary: 3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world.
Existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis.
We propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images.
- Score: 93.49713023975727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D object detection in autonomous driving aims to reason "what" and "where"
the objects of interest present in a 3D world. Following the conventional
wisdom of previous 2D object detection, existing methods often adopt the
canonical Cartesian coordinate system with perpendicular axis. However, we
conjugate that this does not fit the nature of the ego car's perspective, as
each onboard camera perceives the world in shape of wedge intrinsic to the
imaging geometry with radical (non-perpendicular) axis. Hence, in this paper we
advocate the exploitation of the Polar coordinate system and propose a new
Polar Transformer (PolarFormer) for more accurate 3D object detection in the
bird's-eye-view (BEV) taking as input only multi-camera 2D images.
Specifically, we design a cross attention based Polar detection head without
restriction to the shape of input structure to deal with irregular Polar grids.
For tackling the unconstrained object scale variations along Polar's distance
dimension, we further introduce a multi-scalePolar representation learning
strategy. As a result, our model can make best use of the Polar representation
rasterized via attending to the corresponding image observation in a
sequence-to-sequence fashion subject to the geometric constraints. Thorough
experiments on the nuScenes dataset demonstrate that our PolarFormer
outperforms significantly state-of-the-art 3D object detection alternatives, as
well as yielding competitive performance on BEV semantic segmentation task.
Related papers
- PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View [5.0458717114406975]
We propose to employ the polar BEV representation to substitute the Cartesian BEV representation.
Experiments on nuScenes show that PolarBEVDet achieves the superior performance.
arXiv Detail & Related papers (2024-08-29T01:42:38Z) - MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware
Embeddings [29.050983641961658]
We introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE.
Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras.
arXiv Detail & Related papers (2023-09-30T14:52:26Z) - Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - Neural Correspondence Field for Object Pose Estimation [67.96767010122633]
We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image.
Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum.
arXiv Detail & Related papers (2022-07-30T01:48:23Z) - Polar Parametrization for Vision-based Surround-View 3D Detection [35.2870826850481]
Polar Parametrization for 3D detection reformulates position parametrization, velocity decomposition, perception range, label assignment and loss function.
Based on Polar Parametrization, we propose surround-view 3D DEtection TRansformer, named PolarDETR.
arXiv Detail & Related papers (2022-06-22T10:26:12Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty
Propagation [4.202461384355329]
We propose MonoRUn, a novel 3D object detection framework that learns dense correspondences and geometry in a self-supervised manner.
Our proposed approach outperforms current state-of-the-art methods on KITTI benchmark.
arXiv Detail & Related papers (2021-03-23T15:03:08Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.