Related papers: View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

URL: http://arxiv.org/abs/2305.17972v1
Date: Mon, 29 May 2023 09:30:39 GMT
Title: View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection
Authors: Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone
Abstract summary: We propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods.
Score: 46.077668660248534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, training monocular methods requires a vast amount of annotated data. Interestingly, self-supervised approaches have recently been successfully applied to ease the training process and unlock access to widely available unlabelled data. While related research leverages different priors including LIDAR scans and stereo images, such priors again limit usability. Therefore, in this work, we propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone, leveraging multi-view constraints and weak labels. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods using LIDAR scans or stereo images.

Related papers

PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection [35.524943073010675]
Monocular 3D object detection (M3OD) has long faced challenges due to data scarcity caused by high annotation costs and inherent 2D-to-3D ambiguity.<n>We propose a novel pseudo-labeling framework that uses only video data and is more robust to occlusion, without requiring a multi-view setup, additional sensors, camera poses, or domain-specific training.
arXiv Detail & Related papers (2025-07-03T07:46:39Z)
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data [57.53523870705433]
We propose a novel open-vocabulary monocular 3D object detection framework, dubbed OVM3D-Det. OVM3D-Det does not require high-precision LiDAR or 3D sensor data for either input or generating 3D bounding boxes. It employs open-vocabulary 2D models and pseudo-LiDAR to automatically label 3D objects in RGB images, fostering the learning of open-vocabulary monocular 3D detectors.
arXiv Detail & Related papers (2024-11-23T21:37:21Z)
Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection [52.66283064389691]
State-of-the-art 3D object detectors are often trained on massive labeled datasets. Recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
arXiv Detail & Related papers (2024-06-14T15:21:57Z)
SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection. Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP) This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z)
TripletTrack: 3D Object Tracking using Triplet Embeddings and LSTM [0.0]
3D object tracking is a critical task in autonomous driving systems. In this paper we investigate the use of triplet embeddings in combination with motion representations for 3D object tracking.
arXiv Detail & Related papers (2022-10-28T15:23:50Z)
An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings. We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP. We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z)
Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving [45.405303803618]
We investigate diversity-based active learning (AL) as a potential solution to alleviate the annotation burden. We propose a novel acquisition function that enforces spatial and temporal diversity in the selected samples. We demonstrate the effectiveness of the proposed method on the nuScenes dataset and show that it outperforms existing AL strategies significantly.
arXiv Detail & Related papers (2022-05-16T14:21:30Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.