Monocular 3D Object Detection with Bounding Box Denoising in 3D by
Perceiver
- URL: http://arxiv.org/abs/2304.01289v1
- Date: Mon, 3 Apr 2023 18:24:46 GMT
- Title: Monocular 3D Object Detection with Bounding Box Denoising in 3D by
Perceiver
- Authors: Xianpeng Liu, Ce Zheng, Kelvin Cheng, Nan Xue, Guo-Jun Qi, Tianfu Wu
- Abstract summary: Main challenge of monocular 3D object detection is the accurate localization of 3D center.
We propose a stage-wise approach, which combines the information flow from 2D-to-3D and 3D-to-2D.
Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors.
- Score: 45.16079927526731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The main challenge of monocular 3D object detection is the accurate
localization of 3D center. Motivated by a new and strong observation that this
challenge can be remedied by a 3D-space local-grid search scheme in an ideal
case, we propose a stage-wise approach, which combines the information flow
from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and
3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a
top-down manner. Specifically, we first obtain initial proposals from
off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor
space by local-grid sampling from the initial proposals. Finally, we perform 3D
bounding box denoising at the 3D-to-2D proposal verification stage. To
effectively learn discriminative features for denoising highly overlapped
proposals, this paper presents a method of using the Perceiver I/O model to
fuse the 3D-to-2D geometric information and the 2D appearance information. With
the encoded latent representation of a proposal, the verification head is
implemented with a self-attention module. Our method, named as MonoXiver, is
generic and can be easily adapted to any backbone monocular 3D detectors.
Experimental results on the well-established KITTI dataset and the challenging
large-scale Waymo dataset show that MonoXiver consistently achieves improvement
with limited computation overhead.
Related papers
- MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection [31.58403386994297]
We propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy.
Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations.
To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.
arXiv Detail & Related papers (2023-08-18T09:39:52Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.