Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D
Object Detection
- URL: http://arxiv.org/abs/2203.02193v1
- Date: Fri, 4 Mar 2022 08:55:49 GMT
- Title: Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D
Object Detection
- Authors: Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari,
Francesca Odone
- Abstract summary: We argue that the temporal consistency on the level of object poses, provides an important supervision signal.
Specifically, we propose a self-supervised loss which uses this consistency, in addition to render-and-compare losses.
We finetune a synthetically trained monocular 3D object detection model using the pseudo-labels that we generated on real data.
- Score: 46.077668660248534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection continues to attract attention due to the cost
benefits and wider availability of RGB cameras. Despite the recent advances and
the ability to acquire data at scale, annotation cost and complexity still
limit the size of 3D object detection datasets in the supervised settings.
Self-supervised methods, on the other hand, aim at training deep networks
relying on pretext tasks or various consistency constraints. Moreover, other 3D
perception tasks (such as depth estimation) have shown the benefits of temporal
priors as a self-supervision signal. In this work, we argue that the temporal
consistency on the level of object poses, provides an important supervision
signal given the strong prior on physical motion. Specifically, we propose a
self-supervised loss which uses this consistency, in addition to
render-and-compare losses, to refine noisy pose predictions and derive
high-quality pseudo labels. To assess the effectiveness of the proposed method,
we finetune a synthetically trained monocular 3D object detection model using
the pseudo-labels that we generated on real data. Evaluation on the standard
KITTI3D benchmark demonstrates that our method reaches competitive performance
compared to other monocular self-supervised and supervised methods.
Related papers
- Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook [19.539295469044813]
This study emphasizes the importance of robustness, alongside accuracy and latency, in evaluating perception systems under practical scenarios.
Our work presents an extensive survey of camera-only, LiDAR-only, and multi-modal 3D object detection algorithms, thoroughly evaluating their trade-off between accuracy, latency, and robustness.
Among these, multi-modal 3D detection approaches exhibit superior robustness, and a novel taxonomy is introduced to reorganize the literature for enhanced clarity.
arXiv Detail & Related papers (2024-01-12T12:35:45Z) - View-to-Label: Multi-View Consistency for Self-Supervised 3D Object
Detection [46.077668660248534]
We propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone.
Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2023-05-29T09:30:39Z) - 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone [10.341296683155973]
We propose using a self-supervised training strategy to learn a general point cloud backbone model for downstream 3D vision tasks.
Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a 3D detection head.
Experiments on KITTI and nuScenes benchmarks show that the proposed self-supervised pre-training increases 3D detection performance significantly.
arXiv Detail & Related papers (2022-05-02T07:53:29Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - 3D Spatial Recognition without Spatially Labeled 3D [127.6254240158249]
We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition.
We show that WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time.
arXiv Detail & Related papers (2021-05-13T17:58:07Z) - Detecting Invisible People [58.49425715635312]
We re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects.
We demonstrate that current detection and tracking systems perform dramatically worse on this task.
Second, we build dynamic models that explicitly reason in 3D, making use of observations produced by state-of-the-art monocular depth estimation networks.
arXiv Detail & Related papers (2020-12-15T16:54:45Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.