Related papers: MonoSOWA: Scalable monocular 3D Object detector Without human Annotations

MonoSOWA: Scalable monocular 3D Object detector Without human Annotations

URL: http://arxiv.org/abs/2501.09481v1
Date: Thu, 16 Jan 2025 11:35:22 GMT
Title: MonoSOWA: Scalable monocular 3D Object detector Without human Annotations
Authors: Jan Skvrna, Lukas Neumann,
Abstract summary: We present the first method to train 3D object detectors for monocular RGB cameras without domain-specific human annotations.<n>Thanks to newly proposed Canonical Object Space, the method can not only exploit data across a variety of datasets and camera setups to train a single 3D detector, but unlike previous work it also works out of the box in previously unseen camera setups.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting the three-dimensional position and orientation of objects using a single RGB camera is a foundational task in computer vision with many important applications. Traditionally, 3D object detection methods are trained in a fully-supervised setup, requiring vast amounts of human annotations, which are laborious, costly, and do not scale well with the ever-increasing amounts of data being captured. In this paper, we present the first method to train 3D object detectors for monocular RGB cameras without domain-specific human annotations, thus making orders of magnitude more data available for training. Thanks to newly proposed Canonical Object Space, the method can not only exploit data across a variety of datasets and camera setups to train a single 3D detector, but unlike previous work it also works out of the box in previously unseen camera setups. All this is crucial for practical applications, where the data and cameras are extremely heterogeneous. The method is evaluated on two standard autonomous driving datasets, where it outperforms previous works, which, unlike our method, still rely on 2D human annotations.

Related papers

Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection [52.66283064389691]
State-of-the-art 3D object detectors are often trained on massive labeled datasets. Recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
arXiv Detail & Related papers (2024-06-14T15:21:57Z)
MDT3D: Multi-Dataset Training for LiDAR 3D Object Detection Generalization [3.8243923744440926]
3D object detection models trained on a source dataset with a specific point distribution have shown difficulties in generalizing to unseen datasets. We leverage the information available from several annotated source datasets with our Multi-Dataset Training for 3D Object Detection (MDT3D) method. We show how we managed the mix of datasets during training and finally introduce a new cross-dataset augmentation method: cross-dataset object injection.
arXiv Detail & Related papers (2023-08-02T08:20:00Z)
MonoNext: A 3D Monocular Object Detection with ConvNext [69.33657875725747]
This paper introduces a new Multi-Tasking Learning approach called MonoNext for 3D Object Detection. MonoNext employs a straightforward approach based on the ConvNext network and requires only 3D bounding box data. In our experiments with the KITTI dataset, MonoNext achieved high precision and competitive performance comparable with state-of-the-art approaches.
arXiv Detail & Related papers (2023-08-01T15:15:40Z)
View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection [46.077668660248534]
We propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone. Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2023-05-29T09:30:39Z)
Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application. Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase. We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z)
Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for Autonomous Driving [91.39625612027386]
We propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes. Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset. To solve this task, we propose an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects.
arXiv Detail & Related papers (2023-02-08T07:11:36Z)
MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation [55.96577490779591]
Methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data. We show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
arXiv Detail & Related papers (2021-10-01T14:56:37Z)
3D Annotation Of Arbitrary Objects In The Wild [0.0]
We propose a data annotation pipeline based on SLAM, 3D reconstruction, and 3D-to-2D geometry. The pipeline allows creating 3D and 2D bounding boxes, along with per-pixel annotations of arbitrary objects. Our results showcase almost 90% Intersection-over-Union (IoU) agreement on both semantic segmentation and 2D bounding box detection.
arXiv Detail & Related papers (2021-09-15T09:00:56Z)
MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data. In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras. Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z)
BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View [117.44028458220427]
On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices. We present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images.
arXiv Detail & Related papers (2020-03-09T15:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.