Train in Germany, Test in The USA: Making 3D Object Detectors Generalize
- URL: http://arxiv.org/abs/2005.08139v1
- Date: Sun, 17 May 2020 00:56:18 GMT
- Title: Train in Germany, Test in The USA: Making 3D Object Detectors Generalize
- Authors: Yan Wang, Xiangyu Chen, Yurong You, Li Erran, Bharath Hariharan, Mark
Campbell, Kilian Q. Weinberger, Wei-Lun Chao
- Abstract summary: deep learning has substantially improved the 3D object detection accuracy for LiDAR and stereo camera data alike.
Most datasets for autonomous driving are collected within a narrow subset of cities within one country.
In this paper we consider the task of adapting 3D object detectors from one dataset to another.
- Score: 59.455225176042404
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In the domain of autonomous driving, deep learning has substantially improved
the 3D object detection accuracy for LiDAR and stereo camera data alike. While
deep networks are great at generalization, they are also notorious to over-fit
to all kinds of spurious artifacts, such as brightness, car sizes and models,
that may appear consistently throughout the data. In fact, most datasets for
autonomous driving are collected within a narrow subset of cities within one
country, typically under similar weather conditions. In this paper we consider
the task of adapting 3D object detectors from one dataset to another. We
observe that naively, this appears to be a very challenging task, resulting in
drastic drops in accuracy levels. We provide extensive experiments to
investigate the true adaptation challenges and arrive at a surprising
conclusion: the primary adaptation hurdle to overcome are differences in car
sizes across geographic areas. A simple correction based on the average car
size yields a strong correction of the adaptation gap. Our proposed method is
simple and easily incorporated into most 3D object detection frameworks. It
provides a first baseline for 3D object detection adaptation across countries,
and gives hope that the underlying problem may be more within grasp than one
may have hoped to believe. Our code is available at
https://github.com/cxy1997/3D_adapt_auto_driving.
Related papers
- HeightFormer: A Semantic Alignment Monocular 3D Object Detection Method from Roadside Perspective [11.841338298700421]
We propose a novel 3D object detection framework integrating Spatial Former and Voxel Pooling Former to enhance 2D-to-3D projection based on height estimation.
Experiments were conducted using the Rope3D and DAIR-V2X-I dataset, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists.
arXiv Detail & Related papers (2024-10-10T09:37:33Z) - UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps [2.79552147676281]
We introduce Unsupervised Adversarial Domain Adaptation for 3D Object Detection (UADA3D)
We demonstrate its efficacy in various adaptation scenarios, showing significant improvements in both self-driving car and mobile robot domains.
Our code is open-source and will be available soon.
arXiv Detail & Related papers (2024-03-26T12:08:14Z) - Unsupervised Adaptation from Repeated Traversals for Autonomous Driving [54.59577283226982]
Self-driving cars must generalize to the end-user's environment to operate reliably.
One potential solution is to leverage unlabeled data collected from the end-users' environments.
There is no reliable signal in the target domain to supervise the adaptation process.
We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain.
arXiv Detail & Related papers (2023-03-27T15:07:55Z) - Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for
Autonomous Driving [91.39625612027386]
We propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.
Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset.
To solve this task, we propose an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects.
arXiv Detail & Related papers (2023-02-08T07:11:36Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.