Multimodal Motion Prediction with Stacked Transformers
- URL: http://arxiv.org/abs/2103.11624v2
- Date: Wed, 24 Mar 2021 06:37:16 GMT
- Title: Multimodal Motion Prediction with Stacked Transformers
- Authors: Yicheng Liu, Jinghuai Zhang, Liangji Fang, Qinhong Jiang, Bolei Zhou
- Abstract summary: We propose a novel transformer framework for multimodal motion prediction, termed as mmTransformer.
A novel network architecture based on stacked transformers is designed to model the multimodality at feature level with a set of fixed independent proposals.
A region-based training strategy is then developed to induce the multimodality of the generated proposals.
- Score: 35.9674180611893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting multiple plausible future trajectories of the nearby vehicles is
crucial for the safety of autonomous driving. Recent motion prediction
approaches attempt to achieve such multimodal motion prediction by implicitly
regularizing the feature or explicitly generating multiple candidate proposals.
However, it remains challenging since the latent features may concentrate on
the most frequent mode of the data while the proposal-based methods depend
largely on the prior knowledge to generate and select the proposals. In this
work, we propose a novel transformer framework for multimodal motion
prediction, termed as mmTransformer. A novel network architecture based on
stacked transformers is designed to model the multimodality at feature level
with a set of fixed independent proposals. A region-based training strategy is
then developed to induce the multimodality of the generated proposals.
Experiments on Argoverse dataset show that the proposed model achieves the
state-of-the-art performance on motion prediction, substantially improving the
diversity and the accuracy of the predicted trajectories. Demo video and code
are available at https://decisionforce.github.io/mmTransformer.
Related papers
- Multi-Transmotion: Pre-trained Model for Human Motion Prediction [68.87010221355223]
Multi-Transmotion is an innovative transformer-based model designed for cross-modality pre-training.
Our methodology demonstrates competitive performance across various datasets on several downstream tasks.
arXiv Detail & Related papers (2024-11-04T23:15:21Z) - MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and
Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions.
In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges.
The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries.
We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z) - Multimodal Manoeuvre and Trajectory Prediction for Automated Driving on
Highways Using Transformer Networks [5.571793666361683]
We propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods.
The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction.
The results show that our framework outperforms the state-of-the-art multimodal methods in terms of prediction error.
arXiv Detail & Related papers (2023-03-28T16:25:16Z) - ProphNet: Efficient Agent-Centric Motion Forecasting with
Anchor-Informed Proposals [6.927103549481412]
Motion forecasting is a key module in an autonomous driving system.
Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging.
This paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion prediction.
arXiv Detail & Related papers (2023-03-21T17:58:28Z) - Motion Transformer with Global Intention Localization and Local Movement
Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement.
MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z) - STrajNet: Occupancy Flow Prediction via Multi-modal Swin Transformer [7.755385141347842]
This work proposes STrajNet: a multi-modal Swin Transformerbased framework for effective scene occupancy and flow predictions.
We employ Swin Transformer to encode the image and interaction-aware motion representations and propose a cross-attention module to inject motion awareness into grid cells.
Flow and occupancy predictions are then decoded through temporalsharing Pyramid decoders.
arXiv Detail & Related papers (2022-07-31T08:36:55Z) - Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID)
We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories.
Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z) - SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction.
multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z) - MultiXNet: Multiclass Multistage Multimodal Motion Prediction [27.046311751308775]
MultiXNet is an end-to-end approach for detection and motion prediction based directly on lidar sensor data.
The method was evaluated on large-scale, real-world data collected by a fleet of SDVs in several cities.
arXiv Detail & Related papers (2020-06-03T01:01:48Z) - TPNet: Trajectory Proposal Network for Motion Prediction [81.28716372763128]
Trajectory Proposal Network (TPNet) is a novel two-stage motion prediction framework.
TPNet first generates a candidate set of future trajectories as hypothesis proposals, then makes the final predictions by classifying and refining the proposals.
Experiments on four large-scale trajectory prediction datasets, show that TPNet achieves the state-of-the-art results both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-04-26T00:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.