Are we Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D
Object Detection?
- URL: http://arxiv.org/abs/2012.05796v2
- Date: Thu, 13 May 2021 11:46:41 GMT
- Title: Are we Missing Confidence in Pseudo-LiDAR Methods for Monocular 3D
Object Detection?
- Authors: Andrea Simonelli, Samuel Rota Bul\`o, Lorenzo Porzi, Peter
Kontschieder, Elisa Ricci
- Abstract summary: We show experimentally that validation results published by PL-based methods are substantially biased.
We introduce a novel deep architecture which introduces a 3D confidence prediction module.
We show that 3D confidence estimation techniques derived from RGB-only 3D detection approaches can be successfully integrated into our framework.
- Score: 44.74595167179931
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pseudo-LiDAR-based methods for monocular 3D object detection have received
considerable attention in the community due to the performance gains exhibited
on the KITTI3D benchmark, in particular on the commonly reported validation
split. This generated a distorted impression about the superiority of
Pseudo-LiDAR-based (PL-based) approaches over methods working with RGB images
only. Our first contribution consists in rectifying this view by pointing out
and showing experimentally that the validation results published by PL-based
methods are substantially biased. The source of the bias resides in an overlap
between the KITTI3D object detection validation set and the training/validation
sets used to train depth predictors feeding PL-based methods. Surprisingly, the
bias remains also after geographically removing the overlap. This leaves the
test set as the only reliable set for comparison, where published PL-based
methods do not excel. Our second contribution brings PL-based methods back up
in the ranking with the design of a novel deep architecture which introduces a
3D confidence prediction module. We show that 3D confidence estimation
techniques derived from RGB-only 3D detection approaches can be successfully
integrated into our framework and, more importantly, that improved performance
can be obtained with a newly designed 3D confidence measure, leading to
state-of-the-art performance on the KITTI3D benchmark.
Related papers
- Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z) - Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z) - OriCon3D: Effective 3D Object Detection using Orientation and Confidence [0.0]
We propose an advanced methodology for the detection of 3D objects from a single image.
We use a deep convolutional neural network-based 3D object weighted orientation regression paradigm.
Our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies.
arXiv Detail & Related papers (2023-04-27T19:52:47Z) - DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical
Representations [2.542864854772221]
We present a novel Domain Adaptation-aided deep Table detection method called DATa.
It guarantees satisfactory performance in a specific target domain where few trusted labels are available.
Experiments show that DATa substantially outperforms competing methods that only utilize visual representations in the target domain.
arXiv Detail & Related papers (2022-11-12T12:14:16Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR [22.202192422883122]
We propose a novel two-stage network to advance the self-supervised monocular dense depth learning.
Our model fuses monocular image features and sparse LiDAR features to predict initial depth maps.
Our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection.
arXiv Detail & Related papers (2021-09-20T15:28:36Z) - Lite-FPN for Keypoint-based Monocular 3D Object Detection [18.03406686769539]
Keypoint-based monocular 3D object detection has made tremendous progress and achieved great speed-accuracy trade-off.
We propose a sort of lightweight feature pyramid network called Lite-FPN to achieve multi-scale feature fusion.
Our proposed method achieves significantly higher accuracy and frame rate at the same time.
arXiv Detail & Related papers (2021-05-01T14:44:31Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.