Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor
Distance Voting
- URL: http://arxiv.org/abs/2107.02493v1
- Date: Tue, 6 Jul 2021 09:18:33 GMT
- Title: Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor
Distance Voting
- Authors: Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang,
Jianmin Ji and Yu Zhang
- Abstract summary: We present a novel neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds.
Our results on the bird's eye view detection outperform the state-of-the-art performance by a large margin, especially for the hard'' level detection.
- Score: 12.611269919468999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As cameras are increasingly deployed in new application domains such as
autonomous driving, performing 3D object detection on monocular images becomes
an important task for visual scene understanding. Recent advances on monocular
3D object detection mainly rely on the ``pseudo-LiDAR'' generation, which
performs monocular depth estimation and lifts the 2D pixels to pseudo 3D
points. However, depth estimation from monocular images, due to its poor
accuracy, leads to inevitable position shift of pseudo-LiDAR points within the
object. Therefore, the predicted bounding boxes may suffer from inaccurate
location and deformed shape. In this paper, we present a novel neighbor-voting
method that incorporates neighbor predictions to ameliorate object detection
from severely deformed pseudo-LiDAR point clouds. Specifically, each feature
point around the object forms their own predictions, and then the ``consensus''
is achieved through voting. In this way, we can effectively combine the
neighbors' predictions with local prediction and achieve more accurate 3D
detection. To further enlarge the difference between the foreground region of
interest (ROI) pseudo-LiDAR points and the background points, we also encode
the ROI prediction scores of 2D foreground pixels into the corresponding
pseudo-LiDAR points. We conduct extensive experiments on the KITTI benchmark to
validate the merits of our proposed method. Our results on the bird's eye view
detection outperform the state-of-the-art performance by a large margin,
especially for the ``hard'' level detection.
Related papers
- Predict to Detect: Prediction-guided 3D Object Detection using
Sequential Images [15.51093009875854]
We propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework.
P2D predicts object information in the current frame using solely past frames to learn temporal motion features.
We then introduce a novel temporal feature aggregation method that attentively exploits Bird's-Eye-View (BEV) features based on predicted object information.
arXiv Detail & Related papers (2023-06-14T14:22:56Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Categorical Depth Distribution Network for Monocular 3D Object Detection [7.0405916639906785]
Key challenge in monocular 3D detection is accurately predicting object depth.
Many methods attempt to directly estimate depth to assist in 3D detection, but show limited performance as a result of depth inaccuracy.
We propose Categorical Depth Distribution Network (CaDDN) to project rich contextual feature information to the appropriate depth interval in 3D space.
We validate our approach on the KITTI 3D object detection benchmark, where we rank 1st among published monocular methods.
arXiv Detail & Related papers (2021-03-01T16:08:29Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Object-Aware Centroid Voting for Monocular 3D Object Detection [30.59728753059457]
We propose an end-to-end trainable monocular 3D object detector without learning the dense depth.
A novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution.
With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image.
arXiv Detail & Related papers (2020-07-20T02:11:18Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.