BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object
Detection for Autonomous Driving
- URL: http://arxiv.org/abs/2107.04937v1
- Date: Sun, 11 Jul 2021 01:11:58 GMT
- Title: BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object
Detection for Autonomous Driving
- Authors: Hazem Rashed, Mariam Essam, Maha Mohamed, Ahmad El Sallab and Senthil
Yogamani
- Abstract summary: CNNs can leverage the global context in the scene to project better.
We create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes.
We observe a significant improvement of 13% in mIoU using the simple baseline implementation.
- Score: 2.9769485817170387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detection of moving objects is a very important task in autonomous driving
systems. After the perception phase, motion planning is typically performed in
Bird's Eye View (BEV) space. This would require projection of objects detected
on the image plane to top view BEV plane. Such a projection is prone to errors
due to lack of depth information and noisy mapping in far away areas. CNNs can
leverage the global context in the scene to project better. In this work, we
explore end-to-end Moving Object Detection (MOD) on the BEV map directly using
monocular images as input. To the best of our knowledge, such a dataset does
not exist and we create an extended KITTI-raw dataset consisting of 12.9k
images with annotations of moving object masks in BEV space for five classes.
The dataset is intended to be used for class agnostic motion cue based object
detection and classes are provided as meta-data for better tuning. We design
and implement a two-stream RGB and optical flow fusion architecture which
outputs motion segmentation directly in BEV space. We compare it with inverse
perspective mapping of state-of-the-art motion segmentation predictions on the
image plane. We observe a significant improvement of 13% in mIoU using the
simple baseline implementation. This demonstrates the ability to directly learn
motion segmentation output in BEV space. Qualitative results of our baseline
and the dataset annotations can be found in
https://sites.google.com/view/bev-modnet.
Related papers
- CV-MOS: A Cross-View Model for Motion Segmentation [13.378850442525945]
We introduce CV-MOS, a cross-view model for moving object segmentation.
We decouple spatial-temporal information by capturing the motion from BEV and RV residual maps.
Our method achieved leading IoU(%) scores of 77.5% and 79.2% on the validation and test sets of the SemanticKitti dataset.
arXiv Detail & Related papers (2024-08-25T09:39:26Z) - OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation [57.2213693781672]
Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems.
We propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance.
Our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation.
arXiv Detail & Related papers (2024-07-18T03:48:22Z) - Semi-Supervised Learning for Visual Bird's Eye View Semantic
Segmentation [16.3996408206659]
We present a novel semi-supervised framework for visual BEV semantic segmentation to boost performance by exploiting unlabeled images during the training.
A consistency loss that makes full use of unlabeled data is then proposed to constrain the model on not only semantic prediction but also the BEV feature.
Experiments on the nuScenes and Argoverse datasets show that our framework can effectively improve prediction accuracy.
arXiv Detail & Related papers (2023-08-28T12:23:36Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images [2.69840007334476]
Birds-eye View (BEV) expresses the location of different traffic participants in the ego vehicle frame from a top-down view.
We propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV)
We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene.
arXiv Detail & Related papers (2022-11-08T20:57:56Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z) - "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better
Instantaneous Mapping [45.94778766867247]
Estimating a semantically segmented bird's-eye-view map from a single image has become a popular technique for autonomous control and navigation.
We show an increase in localization error with distance from the camera.
We propose a graph neural network which predicts BEV objects from a monocular image by spatially reasoning about an object within the context of other objects.
arXiv Detail & Related papers (2022-04-06T17:23:13Z) - LiMoSeg: Real-time Bird's Eye View based LiDAR Motion Segmentation [8.184561295177623]
This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data.
We use two successive scans of LiDAR data in 2D Bird's Eye View representation to perform pixel-wise classification as static or moving.
We demonstrate a low latency of 8 ms on a commonly used automotive embedded platform, namely Nvidia Jetson Xavier.
arXiv Detail & Related papers (2021-11-08T23:40:55Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.