LISO: Lidar-only Self-Supervised 3D Object Detection
- URL: http://arxiv.org/abs/2403.07071v1
- Date: Mon, 11 Mar 2024 18:02:52 GMT
- Title: LISO: Lidar-only Self-Supervised 3D Object Detection
- Authors: Stefan Baur, Frank Moosmann, Andreas Geiger
- Abstract summary: We introduce a novel self-supervised method to train SOTA lidar object detection networks.
It works on unlabeled sequences of lidar point clouds only.
It utilizes a SOTA self-supervised lidar scene flow network under the hood to generate, track, and iteratively refine pseudo ground truth.
- Score: 25.420879730860936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is one of the most important components in any
Self-Driving stack, but current state-of-the-art (SOTA) lidar object detectors
require costly & slow manual annotation of 3D bounding boxes to perform well.
Recently, several methods emerged to generate pseudo ground truth without human
supervision, however, all of these methods have various drawbacks: Some methods
require sensor rigs with full camera coverage and accurate calibration, partly
supplemented by an auxiliary optical flow engine. Others require expensive
high-precision localization to find objects that disappeared over multiple
drives. We introduce a novel self-supervised method to train SOTA lidar object
detection networks which works on unlabeled sequences of lidar point clouds
only, which we call trajectory-regularized self-training. It utilizes a SOTA
self-supervised lidar scene flow network under the hood to generate, track, and
iteratively refine pseudo ground truth. We demonstrate the effectiveness of our
approach for multiple SOTA object detection networks across multiple real-world
datasets. Code will be released.
Related papers
- Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - End-to-End 3D Object Detection using LiDAR Point Cloud [0.0]
We present an approach wherein, using a novel encoding of the LiDAR point cloud we infer the location of different classes near the autonomous vehicles.
The output is predictions about the location and orientation of objects in the scene in form of 3D bounding boxes and labels of scene objects.
arXiv Detail & Related papers (2023-12-24T00:52:14Z) - FocalFormer3D : Focusing on Hard Instance for 3D Object Detection [97.56185033488168]
False negatives (FN) in 3D object detection can lead to potentially dangerous situations in autonomous driving.
In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies textitFN in a multi-stage manner.
We instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects.
arXiv Detail & Related papers (2023-08-08T20:06:12Z) - View-to-Label: Multi-View Consistency for Self-Supervised 3D Object
Detection [46.077668660248534]
We propose a novel approach to self-supervise 3D object detection purely from RGB sequences alone.
Our experiments on KITTI 3D dataset demonstrate performance on par with state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2023-05-29T09:30:39Z) - Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for
Autonomous Driving [91.39625612027386]
We propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.
Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset.
To solve this task, we propose an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects.
arXiv Detail & Related papers (2023-02-08T07:11:36Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object
Tracking [9.62721286522053]
We propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion.
Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association.
We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark.
arXiv Detail & Related papers (2021-07-11T23:56:53Z) - EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [68.8204255655161]
Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time.
Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal.
We propose EagerMOT, a simple tracking formulation that integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics.
arXiv Detail & Related papers (2021-04-29T22:30:29Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - A Simple and Efficient Multi-task Network for 3D Object Detection and
Road Understanding [20.878931360708343]
We show that it is possible to perform all perception tasks via a simple and efficient multi-task network.
Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding.
arXiv Detail & Related papers (2021-03-06T08:00:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.