M3DSSD: Monocular 3D Single Stage Object Detector
- URL: http://arxiv.org/abs/2103.13164v1
- Date: Wed, 24 Mar 2021 13:09:11 GMT
- Title: M3DSSD: Monocular 3D Single Stage Object Detector
- Authors: Shujie Luo, Hang Dai, Ling Shao, Yong Ding
- Abstract summary: We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
- Score: 82.25793227026443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a Monocular 3D Single Stage object Detector
(M3DSSD) with feature alignment and asymmetric non-local attention. Current
anchor-based monocular 3D object detection methods suffer from feature
mismatching. To overcome this, we propose a two-step feature alignment
approach. In the first step, the shape alignment is performed to enable the
receptive field of the feature map to focus on the pre-defined anchors with
high confidence scores. In the second step, the center alignment is used to
align the features at 2D/3D centers. Further, it is often difficult to learn
global information and capture long-range relationships, which are important
for the depth prediction of objects. Therefore, we propose a novel asymmetric
non-local attention block with multi-scale sampling to extract depth-wise
features. The proposed M3DSSD achieves significantly better performance than
the monocular 3D object detection methods on the KITTI dataset, in both 3D
object detection and bird's eye view tasks.
Related papers
- Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a
Single Image [37.83574424518901]
3D object detection from a single image is an important task in Autonomous Driving.
We propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection.
arXiv Detail & Related papers (2021-03-05T05:47:52Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.