PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention
Modulation and Gated Multitask Learning
- URL: http://arxiv.org/abs/2210.07886v1
- Date: Fri, 14 Oct 2022 15:12:00 GMT
- Title: PedFormer: Pedestrian Behavior Prediction via Cross-Modal Attention
Modulation and Gated Multitask Learning
- Authors: Amir Rasouli, Iuliia Kotseruba
- Abstract summary: We propose a novel framework that relies on different data modalities to predict future trajectories and crossing actions of pedestrians from an ego-centric perspective.
We show that our model improves state-of-the-art in trajectory and action prediction by up to 22% and 13% respectively on various metrics.
- Score: 10.812772606528172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting pedestrian behavior is a crucial task for intelligent driving
systems. Accurate predictions require a deep understanding of various
contextual elements that potentially impact the way pedestrians behave. To
address this challenge, we propose a novel framework that relies on different
data modalities to predict future trajectories and crossing actions of
pedestrians from an ego-centric perspective. Specifically, our model utilizes a
cross-modal Transformer architecture to capture dependencies between different
data types. The output of the Transformer is augmented with representations of
interactions between pedestrians and other traffic agents conditioned on the
pedestrian and ego-vehicle dynamics that are generated via a semantic attentive
interaction module. Lastly, the context encodings are fed into a multi-stream
decoder framework using a gated-shared network. We evaluate our algorithm on
public pedestrian behavior benchmarks, PIE and JAAD, and show that our model
improves state-of-the-art in trajectory and action prediction by up to 22% and
13% respectively on various metrics. The advantages brought by components of
our model are investigated via extensive ablation studies.
Related papers
- Multi-Transmotion: Pre-trained Model for Human Motion Prediction [68.87010221355223]
Multi-Transmotion is an innovative transformer-based model designed for cross-modality pre-training.
Our methodology demonstrates competitive performance across various datasets on several downstream tasks.
arXiv Detail & Related papers (2024-11-04T23:15:21Z) - Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction [15.454206825258169]
Predicting pedestrian motion trajectories is crucial for path planning and motion control of autonomous vehicles.
Recent deep learning-based prediction approaches mainly utilize information like trajectory history and interactions between pedestrians.
This paper proposes a graph transformer structure to improve prediction performance.
arXiv Detail & Related papers (2024-01-10T01:50:29Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Pedestrian Stop and Go Forecasting with Hybrid Feature Fusion [87.77727495366702]
We introduce the new task of pedestrian stop and go forecasting.
Considering the lack of suitable existing datasets for it, we release TRANS, a benchmark for explicitly studying the stop and go behaviors of pedestrians in urban traffic.
We build it from several existing datasets annotated with pedestrians' walking motions, in order to have various scenarios and behaviors.
arXiv Detail & Related papers (2022-03-04T18:39:31Z) - Pedestrian Trajectory Prediction via Spatial Interaction Transformer
Network [7.150832716115448]
In traffic scenes, when encountering with oncoming people, pedestrians may make sudden turns or stop immediately.
To predict such unpredictable trajectories, we can gain insights into the interaction between pedestrians.
We present a novel generative method named Spatial Interaction Transformer (SIT), which learns the correlation of pedestrian trajectories through attention mechanisms.
arXiv Detail & Related papers (2021-12-13T13:08:04Z) - Pedestrian Behavior Prediction via Multitask Learning and Categorical
Interaction Modeling [13.936894582450734]
We propose a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data.
We show that our model achieves state-of-the-art performance and improves trajectory and action prediction by up to 22% and 6% respectively.
arXiv Detail & Related papers (2020-12-06T15:57:11Z) - Multi-Modal Hybrid Architecture for Pedestrian Action Prediction [14.032334569498968]
We propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future crossing actions of pedestrians.
Using the existing 2D pedestrian behavior benchmarks and a newly annotated 3D driving dataset, we show that our proposed model achieves state-of-the-art performance in pedestrian crossing prediction.
arXiv Detail & Related papers (2020-11-16T15:17:58Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z) - Pedestrian Action Anticipation using Contextual Feature Fusion in
Stacked RNNs [19.13270454742958]
We propose a solution for the problem of pedestrian action anticipation at the point of crossing.
Our approach uses a novel stacked RNN architecture in which information collected from various sources, both scene dynamics and visual features, is gradually fused into the network.
arXiv Detail & Related papers (2020-05-13T20:59:37Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.