MultiPath++: Efficient Information Fusion and Trajectory Aggregation for
Behavior Prediction
- URL: http://arxiv.org/abs/2111.14973v2
- Date: Wed, 1 Dec 2021 16:47:32 GMT
- Title: MultiPath++: Efficient Information Fusion and Trajectory Aggregation for
Behavior Prediction
- Authors: Balakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivastava, Khaled S.
Refaat, Nigamaa Nayakanti, Andre Cornman, Kan Chen, Bertrand Douillard, Chi
Pang Lam, Dragomir Anguelov, Benjamin Sapp
- Abstract summary: We present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks.
We show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and Open Motion Prediction Challenge.
- Score: 42.563865078323204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the future behavior of road users is one of the most challenging
and important problems in autonomous driving. Applying deep learning to this
problem requires fusing heterogeneous world state in the form of rich
perception signals and map information, and inferring highly multi-modal
distributions over possible futures. In this paper, we present MultiPath++, a
future prediction model that achieves state-of-the-art performance on popular
benchmarks. MultiPath++ improves the MultiPath architecture by revisiting many
design choices. The first key design difference is a departure from dense
image-based encoding of the input world state in favor of a sparse encoding of
heterogeneous scene elements: MultiPath++ consumes compact and efficient
polylines to describe road features, and raw agent state information directly
(e.g., position, velocity, acceleration). We propose a context-aware fusion of
these elements and develop a reusable multi-context gating fusion component.
Second, we reconsider the choice of pre-defined, static anchors, and develop a
way to learn latent anchor embeddings end-to-end in the model. Lastly, we
explore ensembling and output aggregation techniques -- common in other ML
domains -- and find effective variants for our probabilistic multimodal output
representation. We perform an extensive ablation on these design choices, and
show that our proposed model achieves state-of-the-art performance on the
Argoverse Motion Forecasting Competition and the Waymo Open Dataset Motion
Prediction Challenge.
Related papers
- Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content.
However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation?
We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z) - Drive Anywhere: Generalizable End-to-end Autonomous Driving with
Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text.
Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z) - Pishgu: Universal Path Prediction Architecture through Graph Isomorphism
and Attentive Convolution [2.6774008509840996]
This article proposes Pishgu, a universal graph isomorphism approach for attentive path prediction.
Pishgu captures the inter-dependencies within the subjects in each frame by taking advantage of Graph Isomorphism Networks.
We evaluate the adaptability of our approach to multiple publicly available vehicle (bird's-eye view) and pedestrian (bird's-eye and high-angle view) path prediction datasets.
arXiv Detail & Related papers (2022-10-14T18:48:48Z) - Wayformer: Motion Forecasting via Simple & Efficient Attention Networks [16.031530911221534]
We present Wayformer, a family of attention based architectures for motion forecasting that are simple and homogeneous.
For each fusion type we explore strategies to tradeoff efficiency and quality via factorized attention or latent query attention.
We show that early fusion, despite its simplicity of construction, is not only modality but also achieves state-of-the-art results on both Open MotionDataset (WOMD) and Argoverse leaderboards.
arXiv Detail & Related papers (2022-07-12T21:19:04Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z) - Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions.
Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many.
Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z) - Multi-Modal Hybrid Architecture for Pedestrian Action Prediction [14.032334569498968]
We propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future crossing actions of pedestrians.
Using the existing 2D pedestrian behavior benchmarks and a newly annotated 3D driving dataset, we show that our proposed model achieves state-of-the-art performance in pedestrian crossing prediction.
arXiv Detail & Related papers (2020-11-16T15:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.