Transformer Networks for Trajectory Forecasting
- URL: http://arxiv.org/abs/2003.08111v3
- Date: Wed, 21 Oct 2020 15:26:14 GMT
- Title: Transformer Networks for Trajectory Forecasting
- Authors: Francesco Giuliari, Irtiza Hasan, Marco Cristani, and Fabio Galasso
- Abstract summary: We propose the novel use of Transformer Networks for trajectory forecasting.
This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers.
- Score: 11.802437934289062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most recent successes on forecasting the people motion are based on LSTM
models and all most recent progress has been achieved by modelling the social
interaction among people and the people interaction with the scene. We question
the use of the LSTM models and propose the novel use of Transformer Networks
for trajectory forecasting. This is a fundamental switch from the sequential
step-by-step processing of LSTMs to the only-attention-based memory mechanisms
of Transformers. In particular, we consider both the original Transformer
Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art
on all natural language processing tasks. Our proposed Transformers predict the
trajectories of the individual people in the scene. These are "simple" model
because each person is modelled separately without any complex human-human nor
scene interaction terms. In particular, the TF model without bells and whistles
yields the best score on the largest and most challenging trajectory
forecasting benchmark of TrajNet. Additionally, its extension which predicts
multiple plausible future trajectories performs on par with more engineered
techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers
may deal with missing observations, as it may be the case with real sensor
data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer.
Related papers
- MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction [5.8919870666241945]
We present a Multiscleimat Transformer (MART) network for multi-agent trajectory prediction.
MART is a hypergraph transformer architecture to consider individual and group behaviors in transformer machinery.
In addition, we propose an Adaptive Group Estor (AGE) designed to infer complex group relations in real-world environments.
arXiv Detail & Related papers (2024-07-31T14:31:49Z) - Transformers versus LSTMs for electronic trading [0.0]
This study investigates whether Transformer-based model can be applied in financial time series prediction and beat LSTM.
A new LSTM-based model called DLSTM is built and new architecture for the Transformer-based model is designed to adapt for financial prediction.
The experiment result reflects that the Transformer-based model only has the limited advantage in absolute price sequence prediction.
arXiv Detail & Related papers (2023-09-20T15:25:43Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Mnemosyne: Learning to Train Transformers with Transformers [18.36543176998175]
We show that Mnemosyne can successfully train Transformers while using simple meta-training strategies that require minimal computational resources.
Mnemosyne provides space comparable complexity to that its hand-designed first-order counterparts, which allows it to scale to training larger sets of parameters.
arXiv Detail & Related papers (2023-02-02T14:40:28Z) - TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
Transformer [188.00681648113223]
We explore neat yet effective Transformer-based frameworks for visual grounding.
TransVG establishes multi-modal correspondences by Transformers and localizes referred regions by directly regressing box coordinates.
We upgrade our framework to a purely Transformer-based one by leveraging Vision Transformer (ViT) for vision feature encoding.
arXiv Detail & Related papers (2022-06-14T06:27:38Z) - Under the Hood of Transformer Networks for Trajectory Forecasting [11.001055546731623]
Transformer Networks have established themselves as the de-facto state-of-the-art for trajectory forecasting.
This paper proposes the first in-depth study of Transformer Networks (TF) and Bidirectional Transformers (BERT) for the forecasting of the individual motion of people.
arXiv Detail & Related papers (2022-03-22T16:56:05Z) - Learning Bounded Context-Free-Grammar via LSTM and the
Transformer:Difference and Explanations [51.77000472945441]
Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks.
In practice, it is often observed that Transformer models have better representation power than LSTM.
We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns.
arXiv Detail & Related papers (2021-12-16T19:56:44Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Parameter Efficient Multimodal Transformers for Video Representation
Learning [108.8517364784009]
This work focuses on reducing the parameters of multimodal Transformers in the context of audio-visual video representation learning.
We show that our approach reduces parameters up to 80$%$, allowing us to train our model end-to-end from scratch.
To demonstrate our approach, we pretrain our model on 30-second clips from Kinetics-700 and transfer it to audio-visual classification tasks.
arXiv Detail & Related papers (2020-12-08T00:16:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.