PIP-Net: Pedestrian Intention Prediction in the Wild
- URL: http://arxiv.org/abs/2402.12810v2
- Date: Fri, 1 Mar 2024 15:02:25 GMT
- Title: PIP-Net: Pedestrian Intention Prediction in the Wild
- Authors: Mohsen Azarmi, Mahdi Rezaei, He Wang, Sebastien Glaser
- Abstract summary: PIP-Net is a novel framework designed to predict pedestrian crossing intentions by AVs in real-world urban scenarios.
We offer two variants of PIP-Net designed for different camera mounts and setups.
The proposed model employs a recurrent and temporal attention-based solution, outperforming state-of-the-art performance.
For the first time, we present the Urban-PIP dataset, a customised pedestrian intention prediction dataset.
- Score: 11.799731429829603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate pedestrian intention prediction (PIP) by Autonomous Vehicles (AVs)
is one of the current research challenges in this field. In this article, we
introduce PIP-Net, a novel framework designed to predict pedestrian crossing
intentions by AVs in real-world urban scenarios. We offer two variants of
PIP-Net designed for different camera mounts and setups. Leveraging both
kinematic data and spatial features from the driving scene, the proposed model
employs a recurrent and temporal attention-based solution, outperforming
state-of-the-art performance. To enhance the visual representation of road
users and their proximity to the ego vehicle, we introduce a categorical depth
feature map, combined with a local motion flow feature, providing rich insights
into the scene dynamics. Additionally, we explore the impact of expanding the
camera's field of view, from one to three cameras surrounding the ego vehicle,
leading to enhancement in the model's contextual perception. Depending on the
traffic scenario and road environment, the model excels in predicting
pedestrian crossing intentions up to 4 seconds in advance which is a
breakthrough in current research studies in pedestrian intention prediction.
Finally, for the first time, we present the Urban-PIP dataset, a customised
pedestrian intention prediction dataset, with multi-camera annotations in
real-world automated driving scenarios.
Related papers
- BEVSeg2TP: Surround View Camera Bird's-Eye-View Based Joint Vehicle
Segmentation and Ego Vehicle Trajectory Prediction [4.328789276903559]
Trajectory prediction is a key task for vehicle autonomy.
There is a growing interest in learning-based trajectory prediction.
We show that there is the potential to improve the performance of perception.
arXiv Detail & Related papers (2023-12-20T15:02:37Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Predicting Future Occupancy Grids in Dynamic Environment with
Spatio-Temporal Learning [63.25627328308978]
We propose a-temporal prediction network pipeline to generate future occupancy predictions.
Compared to current SOTA, our approach predicts occupancy for a longer horizon of 3 seconds.
We publicly release our grid occupancy dataset based on nulis to support further research.
arXiv Detail & Related papers (2022-05-06T13:45:32Z) - Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
We advocate for predicting both the individual motions as well as the scene occupancy map.
We propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians.
On two large-scale real-world datasets, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods.
arXiv Detail & Related papers (2021-01-07T06:08:21Z) - PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction
in 3D [10.580548257913843]
We propose a new pedestrian action prediction dataset created by adding per-frame 2D/3D bounding box and behavioral annotations to nuScenes.
In addition, we propose a hybrid neural network architecture that incorporates various data modalities for predicting pedestrian crossing action.
arXiv Detail & Related papers (2020-12-14T18:13:44Z) - Multi-Modal Hybrid Architecture for Pedestrian Action Prediction [14.032334569498968]
We propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future crossing actions of pedestrians.
Using the existing 2D pedestrian behavior benchmarks and a newly annotated 3D driving dataset, we show that our proposed model achieves state-of-the-art performance in pedestrian crossing prediction.
arXiv Detail & Related papers (2020-11-16T15:17:58Z) - Map-Adaptive Goal-Based Trajectory Prediction [3.1948816877289263]
We present a new method for multi-modal, long-term vehicle trajectory prediction.
Our approach relies on using lane centerlines captured in rich maps of the environment to generate a set of proposed goal paths for each vehicle.
We show that our model outperforms state-of-the-art approaches for vehicle trajectory prediction over a 6-second horizon.
arXiv Detail & Related papers (2020-09-09T17:57:01Z) - Two-Stream Networks for Lane-Change Prediction of Surrounding Vehicles [8.828423067460644]
In highway scenarios, an alert human driver will typically anticipate early cut-in and cut-out maneuvers surrounding vehicles using only visual cues.
To deal with lane-change recognition and prediction of surrounding vehicles, we pose the problem as an action recognition/prediction problem by stacking visual cues from video cameras.
Two video action recognition approaches are analyzed: two-stream convolutional networks and multiplier networks.
arXiv Detail & Related papers (2020-08-25T07:59:15Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.