Multi-modal Streaming 3D Object Detection
- URL: http://arxiv.org/abs/2209.04966v1
- Date: Mon, 12 Sep 2022 00:30:52 GMT
- Title: Multi-modal Streaming 3D Object Detection
- Authors: Mazen Abdelfattah, Kaiwen Yuan, Z. Jane Wang, and Rabab Ward
- Abstract summary: We propose an innovative camera-LiDAR streaming 3D object detection framework.
It uses camera images instead of past LiDAR slices to provide an up-to-date, dense, and wide context for streaming perception.
Our method is shown to be robust to missing camera images, narrow LiDAR slices, and small camera-LiDAR miscalibration.
- Score: 20.01800869678355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern autonomous vehicles rely heavily on mechanical LiDARs for perception.
Current perception methods generally require 360{\deg} point clouds, collected
sequentially as the LiDAR scans the azimuth and acquires consecutive
wedge-shaped slices. The acquisition latency of a full scan (~ 100ms) may lead
to outdated perception which is detrimental to safe operation. Recent streaming
perception works proposed directly processing LiDAR slices and compensating for
the narrow field of view (FOV) of a slice by reusing features from preceding
slices. These works, however, are all based on a single modality and require
past information which may be outdated. Meanwhile, images from high-frequency
cameras can support streaming models as they provide a larger FoV compared to a
LiDAR slice. However, this difference in FoV complicates sensor fusion. To
address this research gap, we propose an innovative camera-LiDAR streaming 3D
object detection framework that uses camera images instead of past LiDAR slices
to provide an up-to-date, dense, and wide context for streaming perception. The
proposed method outperforms prior streaming models on the challenging NuScenes
benchmark. It also outperforms powerful full-scan detectors while being much
faster. Our method is shown to be robust to missing camera images, narrow LiDAR
slices, and small camera-LiDAR miscalibration.
Related papers
- Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images.
In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data.
We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z) - Robust 3D Object Detection from LiDAR-Radar Point Clouds via Cross-Modal
Feature Augmentation [7.364627166256136]
This paper presents a novel framework for robust 3D object detection from point clouds via cross-modal hallucination.
We introduce multiple alignments on both spatial and feature levels to achieve simultaneous backbone refinement and hallucination generation.
Experiments on the View-of-Delft dataset show that our proposed method outperforms the state-of-the-art (SOTA) methods for both radar and LiDAR object detection.
arXiv Detail & Related papers (2023-09-29T15:46:59Z) - CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception [20.824179713013734]
We propose Camera Radar Net (CRN), a novel camera-radar fusion framework.
CRN generates semantically rich and spatially accurate bird's-eye-view (BEV) feature map for various tasks.
CRN with real-time setting operates at 20 FPS while achieving comparable performance to LiDAR detectors on nuScenes.
arXiv Detail & Related papers (2023-04-03T00:47:37Z) - BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework [20.842800465250775]
Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space.
We propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data.
We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings.
arXiv Detail & Related papers (2022-05-27T06:58:30Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - StrObe: Streaming Object Detection from LiDAR Packets [73.27333924964306]
Rolling shutter LiDARs emitted as a stream of packets, each covering a sector of the 360deg coverage.
Modern perception algorithms wait for the full sweep to be built before processing the data, which introduces an additional latency.
In this paper we propose StrObe, a novel approach that minimizes latency by ingesting LiDAR packets and emitting a stream of detections without waiting for the full sweep to be built.
arXiv Detail & Related papers (2020-11-12T14:57:44Z) - Streaming Object Detection for 3-D Point Clouds [29.465873948076766]
LiDAR provides a prominent sensory modality that informs many existing perceptual systems.
The latency for perceptual systems based on point cloud data can be dominated by the amount of time for a complete rotational scan.
We show how operating on LiDAR data in its native streaming formulation offers several advantages for self driving object detection.
arXiv Detail & Related papers (2020-05-04T21:55:15Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.