Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory
Prediction
- URL: http://arxiv.org/abs/2005.08514v2
- Date: Fri, 24 Jul 2020 03:32:07 GMT
- Title: Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory
Prediction
- Authors: Cunjun Yu, Xiao Ma, Jiawei Ren, Haiyu Zhao, Shuai Yi
- Abstract summary: We present STAR, a Spatio-Temporal grAph tRans framework, which tackles trajectory prediction by only attention mechanisms.
We show that STAR achieves state-of-the-art performance on 5 commonly used real-world pedestrian prediction datasets.
- Score: 29.602903750712713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding crowd motion dynamics is critical to real-world applications,
e.g., surveillance systems and autonomous driving. This is challenging because
it requires effectively modeling the socially aware crowd spatial interaction
and complex temporal dependencies. We believe attention is the most important
factor for trajectory prediction. In this paper, we present STAR, a
Spatio-Temporal grAph tRansformer framework, which tackles trajectory
prediction by only attention mechanisms. STAR models intra-graph crowd
interaction by TGConv, a novel Transformer-based graph convolution mechanism.
The inter-graph temporal dependencies are modeled by separate temporal
Transformers. STAR captures complex spatio-temporal interactions by
interleaving between spatial and temporal Transformers. To calibrate the
temporal prediction for the long-lasting effect of disappeared pedestrians, we
introduce a read-writable external memory module, consistently being updated by
the temporal Transformer. We show that with only attention mechanism, STAR
achieves state-of-the-art performance on 5 commonly used real-world pedestrian
prediction datasets.
Related papers
- AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features.
The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction [12.248428883804763]
3D human motion prediction is a research area computation of high significance and a challenge in computer vision.
Traditionally, autogregressive models have been used to predict human motion.
We present a non-autoregressive model for human motion prediction.
arXiv Detail & Related papers (2023-03-11T01:44:29Z) - Adaptive Graph Spatial-Temporal Transformer Network for Traffic Flow
Forecasting [6.867331860819595]
Traffic forecasting can be highly challenging due to complex spatial-temporal correlations and non-linear traffic patterns.
Existing works mostly model such spatial-temporal dependencies by considering spatial correlations and temporal correlations separately.
We propose to directly model the cross-spatial-temporal correlations on the spatial-temporal graph using local multi-head self-attentions.
arXiv Detail & Related papers (2022-07-09T19:21:00Z) - Interaction Transformer for Human Reaction Generation [61.22481606720487]
We propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attentions.
Our method is general and can be used to generate more complex and long-term interactions.
arXiv Detail & Related papers (2022-07-04T19:30:41Z) - Autoformer: Decomposition Transformers with Auto-Correlation for
Long-Term Series Forecasting [68.86835407617778]
Autoformer is a novel decomposition architecture with an Auto-Correlation mechanism.
In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a relative improvement on six benchmarks.
arXiv Detail & Related papers (2021-06-24T13:43:43Z) - Spatial-Channel Transformer Network for Trajectory Prediction on the
Traffic Scenes [2.7955111755177695]
We present a Spatial-Channel Transformer Network for trajectory prediction with attention functions.
A channel-wise module is inserted to measure the social interaction between agents.
We find that the network achieves promising results on real-world trajectory prediction datasets on the traffic scenes.
arXiv Detail & Related papers (2021-01-27T15:03:42Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z) - A Spatio-temporal Transformer for 3D Human Motion Prediction [39.31212055504893]
We propose a Transformer-based architecture for the task of generative modelling of 3D human motion.
We empirically show that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-gressive models.
arXiv Detail & Related papers (2020-04-18T19:49:28Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.