3D for Free: Crossmodal Transfer Learning using HD Maps
- URL: http://arxiv.org/abs/2008.10592v1
- Date: Mon, 24 Aug 2020 17:54:51 GMT
- Title: 3D for Free: Crossmodal Transfer Learning using HD Maps
- Authors: Benjamin Wilson, Zsolt Kira, James Hays
- Abstract summary: We leverage the large class-taxonomies of modern 2D datasets and the robustness of state-of-the-art 2D detection methods.
We mine a collection of 1151 unlabeled, multimodal driving logs from an autonomous vehicle.
We show that detector performance increases as we mine more unlabeled data.
- Score: 36.70550754737353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is a core perceptual challenge for robotics and
autonomous driving. However, the class-taxonomies in modern autonomous driving
datasets are significantly smaller than many influential 2D detection datasets.
In this work, we address the long-tail problem by leveraging both the large
class-taxonomies of modern 2D datasets and the robustness of state-of-the-art
2D detection methods. We proceed to mine a large, unlabeled dataset of images
and LiDAR, and estimate 3D object bounding cuboids, seeded from an
off-the-shelf 2D instance segmentation model. Critically, we constrain this
ill-posed 2D-to-3D mapping by using high-definition maps and object size
priors. The result of the mining process is 3D cuboids with varying confidence.
This mining process is itself a 3D object detector, although not especially
accurate when evaluated as such. However, we then train a 3D object detection
model on these cuboids, consistent with other recent observations in the deep
learning literature, we find that the resulting model is fairly robust to the
noisy supervision that our mining process provides. We mine a collection of
1151 unlabeled, multimodal driving logs from an autonomous vehicle and use the
discovered objects to train a LiDAR-based object detector. We show that
detector performance increases as we mine more unlabeled data. With our full,
unlabeled dataset, our method performs competitively with fully supervised
methods, even exceeding the performance for certain object categories, without
any human 3D annotations.
Related papers
- STONE: A Submodular Optimization Framework for Active 3D Object Detection [20.54906045954377]
Key requirement for training an accurate 3D object detector is the availability of a large amount of LiDAR-based point cloud data.
This paper proposes a unified active 3D object detection framework, for greatly reducing the labeling cost of training 3D object detectors.
arXiv Detail & Related papers (2024-10-04T20:45:33Z) - ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only [5.699475977818167]
3D object detection plays a crucial role in various applications such as autonomous vehicles, robotics and augmented reality.
We propose a weakly supervised 3D annotator that relies solely on 2D bounding box annotations from images, along with size priors.
arXiv Detail & Related papers (2024-07-24T11:58:31Z) - FocalFormer3D : Focusing on Hard Instance for 3D Object Detection [97.56185033488168]
False negatives (FN) in 3D object detection can lead to potentially dangerous situations in autonomous driving.
In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies textitFN in a multi-stage manner.
We instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects.
arXiv Detail & Related papers (2023-08-08T20:06:12Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.