MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views
- URL: http://arxiv.org/abs/2006.05518v2
- Date: Tue, 18 Aug 2020 03:09:18 GMT
- Title: MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views
- Authors: Ke Chen, Ryan Oldja, Nikolai Smolyanskiy, Stan Birchfield, Alexander
Popov, David Wehr, Ibrahim Eden, Joachim Pehserl
- Abstract summary: We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation.
MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input.
We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
- Score: 60.538802124885414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving requires the inference of actionable information such as
detecting and classifying objects, and determining the drivable space. To this
end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural
network for multi-class object detection and drivable space segmentation using
multiple views of a single LiDAR point cloud. The first stage processes the
point cloud projected onto a perspective view in order to semantically segment
the scene. The second stage then processes the point cloud (along with semantic
labels from the first stage) projected onto a bird's eye view, to detect and
classify objects. Both stages use an encoder-decoder architecture. We show that
our multi-view, multi-stage, multi-class approach is able to detect and
classify objects while simultaneously determining the drivable space using a
single LiDAR scan as input, in challenging scenes with more than one hundred
vehicles and pedestrians at a time. The system operates efficiently at 150 fps
on an embedded GPU designed for a self-driving car, including a postprocessing
step to maintain identities over time. We show results on both KITTI and a much
larger internal dataset, thus demonstrating the method's ability to scale by an
order of magnitude.
Related papers
- Spatial-Temporal Multi-Cuts for Online Multiple-Camera Vehicle Tracking [5.679775668038154]
We introduce a graph representation that allows spatial-temporal clustering in a single, combined step.
By keeping sparse appearance and positional cues of all detections in a cluster, our method can compare clusters based on the strongest available evidence.
Our method does not require any training on the target scene, pre-extraction of single-camera tracks, or additional annotations.
arXiv Detail & Related papers (2024-10-03T16:23:33Z) - End-to-End 3D Object Detection using LiDAR Point Cloud [0.0]
We present an approach wherein, using a novel encoding of the LiDAR point cloud we infer the location of different classes near the autonomous vehicles.
The output is predictions about the location and orientation of objects in the scene in form of 3D bounding boxes and labels of scene objects.
arXiv Detail & Related papers (2023-12-24T00:52:14Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - Linking vision and motion for self-supervised object-centric perception [16.821130222597155]
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features.
Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization.
We adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs.
arXiv Detail & Related papers (2023-07-14T04:21:05Z) - OVTrack: Open-Vocabulary Multiple Object Tracking [64.73379741435255]
OVTrack is an open-vocabulary tracker capable of tracking arbitrary object classes.
It sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark.
arXiv Detail & Related papers (2023-04-17T16:20:05Z) - Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images [2.69840007334476]
Birds-eye View (BEV) expresses the location of different traffic participants in the ego vehicle frame from a top-down view.
We propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV)
We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene.
arXiv Detail & Related papers (2022-11-08T20:57:56Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving [10.3540046389057]
This work presents a multi-task visual perception network on unrectified fisheye images.
It consists of six primary tasks necessary for an autonomous driving system.
We demonstrate that the jointly trained model performs better than the respective single task versions.
arXiv Detail & Related papers (2021-02-15T10:46:24Z) - Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet.
We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane.
Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.