Towards Long-Tailed 3D Detection
- URL: http://arxiv.org/abs/2211.08691v2
- Date: Fri, 19 May 2023 20:19:23 GMT
- Title: Towards Long-Tailed 3D Detection
- Authors: Neehar Peri, Achal Dave, Deva Ramanan, Shu Kong
- Abstract summary: We study the problem of Long-Tailed 3D Detection (LT3D), which evaluates on all classes, including those in-the-tail.
Our modifications improve accuracy by 5% AP on average for all classes, and dramatically improve AP for rare classes.
- Score: 56.82185415482943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for
training 3D detectors, particularly on large-scale lidar data. Surprisingly,
although semantic class labels naturally follow a long-tailed distribution,
contemporary benchmarks focus on only a few common classes (e.g., pedestrian
and car) and neglect many rare classes in-the-tail (e.g., debris and stroller).
However, AVs must still detect rare classes to ensure safe operation. Moreover,
semantic classes are often organized within a hierarchy, e.g., tail classes
such as child and construction-worker are arguably subclasses of pedestrian.
However, such hierarchical relationships are often ignored, which may lead to
misleading estimates of performance and missed opportunities for algorithmic
innovation. We address these challenges by formally studying the problem of
Long-Tailed 3D Detection (LT3D), which evaluates on all classes, including
those in-the-tail. We evaluate and innovate upon popular 3D detection
codebases, such as CenterPoint and PointPillars, adapting them for LT3D. We
develop hierarchical losses that promote feature sharing across common-vs-rare
classes, as well as improved detection metrics that award partial credit to
"reasonable" mistakes respecting the hierarchy (e.g., mistaking a child for an
adult). Finally, we point out that fine-grained tail class accuracy is
particularly improved via multimodal fusion of RGB images with LiDAR; simply
put, small fine-grained classes are challenging to identify from sparse (lidar)
geometry alone, suggesting that multimodal cues are crucial to long-tailed 3D
detection. Our modifications improve accuracy by 5% AP on average for all
classes, and dramatically improve AP for rare classes (e.g., stroller AP
improves from 3.6 to 31.6)! Our code is available at
https://github.com/neeharperi/LT3D
Related papers
- Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection [52.66283064389691]
State-of-the-art 3D object detectors are often trained on massive labeled datasets.
Recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels.
We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
arXiv Detail & Related papers (2024-06-14T15:21:57Z) - Long-Tailed 3D Detection via Multi-Modal Fusion [47.03801888003686]
We study the problem of Long-Tailed 3D Detection (LT3D), which evaluates all annotated classes, including those in-the-tail.
We point out that rare-class accuracy is particularly improved via multi-modal late fusion (MMLF) of independently trained uni-modal LiDAR and RGB detectors.
Our proposed MMLF approach significantly improves LT3D performance over prior work, particularly improving rare class performance from 12.8 to 20.0 mAP!
arXiv Detail & Related papers (2023-12-18T07:14:25Z) - DualTeacher: Bridging Coexistence of Unlabelled Classes for
Semi-supervised Incremental Object Detection [53.8061502411777]
In real-world applications, an object detector often encounters object instances from new classes and needs to accommodate them effectively.
Previous work formulated this critical problem as incremental object detection (IOD), which assumes the object instances of new classes to be fully annotated in incremental data.
We consider a more realistic setting named semi-supervised IOD (SSIOD), where the object detector needs to learn new classes incrementally from a few labelled data and massive unlabelled data.
arXiv Detail & Related papers (2023-12-13T10:46:14Z) - 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking [15.330384668966806]
State-of-the-art 3D multi-object tracking (MOT) approaches typically rely on non-learned model-based algorithms such as Kalman Filter.
We propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture.
Our approach achieves 71.2% and 68.2% AMOTA on the nuScenes validation and test split, respectively.
arXiv Detail & Related papers (2023-08-12T19:19:58Z) - DC3DCD: unsupervised learning for multiclass 3D point cloud change
detection [0.0]
We propose an unsupervised method, called Deep 3D Change Detection (DC3DCD), to detect and categorize multiclass changes point level.
Our method builds upon the DeepCluster approach, originally designed for image classification, to handle complex raw 3D PCs.
arXiv Detail & Related papers (2023-05-09T13:13:53Z) - Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for
Autonomous Driving [91.39625612027386]
We propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.
Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset.
To solve this task, we propose an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects.
arXiv Detail & Related papers (2023-02-08T07:11:36Z) - Improving the Intra-class Long-tail in 3D Detection via Rare Example
Mining [29.699694480757472]
Even the best performing models suffer from the most naive mistakes when it comes to rare examples.
We show that rareness is the key to data-centric improvements for 3D detectors, since rareness is the result of a lack in data support.
We propose a general and effective method to identify the rareness of objects based on density estimation in the feature space.
arXiv Detail & Related papers (2022-10-15T20:52:07Z) - Train in Germany, Test in The USA: Making 3D Object Detectors Generalize [59.455225176042404]
deep learning has substantially improved the 3D object detection accuracy for LiDAR and stereo camera data alike.
Most datasets for autonomous driving are collected within a narrow subset of cities within one country.
In this paper we consider the task of adapting 3D object detectors from one dataset to another.
arXiv Detail & Related papers (2020-05-17T00:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.