MotionNet: Joint Perception and Motion Prediction for Autonomous Driving
Based on Bird's Eye View Maps
- URL: http://arxiv.org/abs/2003.06754v1
- Date: Sun, 15 Mar 2020 04:37:12 GMT
- Title: MotionNet: Joint Perception and Motion Prediction for Autonomous Driving
Based on Bird's Eye View Maps
- Authors: Pengxiang Wu, Siheng Chen, Dimitris Metaxas
- Abstract summary: We propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds.
MotionNet takes a sequence of sweeps as input and outputs a bird's eye view (BEV) map, which encodes the object category and motion information in each grid cell.
- Score: 34.24949016811546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to reliably perceive the environmental states, particularly the
existence of objects and their motion behavior, is crucial for autonomous
driving. In this work, we propose an efficient deep model, called MotionNet, to
jointly perform perception and motion prediction from 3D point clouds.
MotionNet takes a sequence of LiDAR sweeps as input and outputs a bird's eye
view (BEV) map, which encodes the object category and motion information in
each grid cell. The backbone of MotionNet is a novel spatio-temporal pyramid
network, which extracts deep spatial and temporal features in a hierarchical
fashion. To enforce the smoothness of predictions over both space and time, the
training of MotionNet is further regularized with novel spatial and temporal
consistency losses. Extensive experiments show that the proposed method overall
outperforms the state-of-the-arts, including the latest scene-flow- and
3D-object-detection-based methods. This indicates the potential value of the
proposed method serving as a backup to the bounding-box-based system, and
providing complementary information to the motion planner in autonomous
driving. Code is available at https://github.com/pxiangwu/MotionNet.
Related papers
- OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment [0.0]
We introduce an end-to-end neural network methodology designed to predict the future behaviors of all dynamic objects in the environment.
We propose a novel time-weighted motion flow loss, whose application has shown a substantial decrease in end-point error.
arXiv Detail & Related papers (2024-04-02T19:37:58Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - TrajectoryFormer: 3D Object Tracking Transformer with Predictive
Trajectory Hypotheses [51.60422927416087]
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots.
We present TrajectoryFormer, a novel point-cloud-based 3D MOT framework.
arXiv Detail & Related papers (2023-06-09T13:31:50Z) - Visual Perception System for Autonomous Driving [9.659835301514288]
This work introduces a visual-based perception system for autonomous driving that integrates trajectory tracking and prediction of moving objects to prevent collisions.
The system leverages motion cues from pedestrians to monitor and forecast their movements and simultaneously maps the environment.
The performance, efficiency, and resilience of this approach are substantiated through comprehensive evaluations of both simulated and real-world datasets.
arXiv Detail & Related papers (2023-03-03T23:12:43Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - Motion Policy Networks [61.87789591369106]
We present an end-to-end neural model called Motion Policy Networks (M$pi$Nets) to generate collision-free, smooth motion from a single depth camera observation.
Our experiments show that M$pi$Nets are significantly faster than global planners while exhibiting the reactivity needed to deal with dynamic scenes.
arXiv Detail & Related papers (2022-10-21T19:37:09Z) - Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D
Convolutions [33.538055872850514]
We tackle the problem of distinguishing 3D LiDAR points that belong to currently moving objects, like walking pedestrians or driving cars, from points that are obtained from non-moving objects, like walls but also parked cars.
Our approach takes a sequence of observed LiDAR scans and turns them into a voxelized sparse 4D point cloud.
We apply computationally efficient sparse 4D convolutions to jointly extract spatial and temporal features and predict moving object confidence scores for all points in the sequence.
arXiv Detail & Related papers (2022-06-08T18:51:14Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - MoNet: Motion-based Point Cloud Prediction Network [13.336278321863595]
3D point clouds accurately model 3D information of surrounding environment.
Due to point clouds are unordered and unstructured, point cloud prediction is challenging.
We propose a novel motion-based neural network named MoNet to predict point clouds.
arXiv Detail & Related papers (2020-11-21T15:43:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.