Related papers: Scene Transformer: A unified multi-task model for behavior prediction and planning

Scene Transformer: A unified multi-task model for behavior prediction and planning

URL: http://arxiv.org/abs/2106.08417v1
Date: Tue, 15 Jun 2021 20:20:44 GMT
Title: Scene Transformer: A unified multi-task model for behavior prediction and planning
Authors: Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, Jonathon Shlens
Abstract summary: We formulate a model for predicting the behavior of all agents jointly in real-world driving environments. Inspired by recent language modeling approaches, we use a masking strategy as the query to our model. We evaluate our approach on autonomous driving datasets for behavior prediction, and achieve state-of-the-art performance.
Score: 42.758178896204036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting the future motion of multiple agents is necessary for planning in dynamic environments. This task is challenging for autonomous driving since agents (e.g., vehicles and pedestrians) and their associated behaviors may be diverse and influence each other. Most prior work has focused on first predicting independent futures for each agent based on all past motion, and then planning against these independent predictions. However, planning against fixed predictions can suffer from the inability to represent the future interaction possibilities between different agents, leading to sub-optimal planning. In this work, we formulate a model for predicting the behavior of all agents jointly in real-world driving environments in a unified manner. Inspired by recent language modeling approaches, we use a masking strategy as the query to our model, enabling one to invoke a single model to predict agent behavior in many ways, such as potentially conditioned on the goal or full future trajectory of the autonomous vehicle or the behavior of other agents in the environment. Our model architecture fuses heterogeneous world state in a unified Transformer architecture by employing attention across road elements, agent interactions and time steps. We evaluate our approach on autonomous driving datasets for behavior prediction, and achieve state-of-the-art performance. Our work demonstrates that formulating the problem of behavior prediction in a unified architecture with a masking strategy may allow us to have a single model that can perform multiple motion prediction and planning related tasks effectively.

Related papers

From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving [4.795092023802721]
Marginal prediction models commonly forecast each agent's future trajectories independently.<n>Joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions.<n>We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency.
arXiv Detail & Related papers (2025-07-07T17:58:53Z)
Poly-Autoregressive Prediction for Modeling Interactions [42.51313085280179]
We propose Poly-Autoregressive (PAR) modeling, which forecasts an ego agent's future behavior. We show that PAR can be applied to three different problems: human action forecasting in social situations, trajectory prediction for autonomous vehicles, and object pose forecasting during hand-object interaction.
arXiv Detail & Related papers (2025-02-12T18:59:43Z)
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving [57.89801036693292]
PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving) considers the timestep-wise interaction to better integrate prediction and planning. We design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions.
arXiv Detail & Related papers (2023-11-14T11:53:24Z)
Deep Interactive Motion Prediction and Planning: Playing Games with Motion Prediction Models [162.21629604674388]
This work presents a game-theoretic Model Predictive Controller (MPC) that uses a novel interactive multi-agent neural network policy as part of its predictive model. Fundamental to the success of our method is the design of a novel multi-agent policy network that can steer a vehicle given the state of the surrounding agents and the map information.
arXiv Detail & Related papers (2022-04-05T17:58:18Z)
Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures. We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view. We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z)
Reactive motion planning with probabilistic safety guarantees [27.91467018272684]
This paper considers the problem of motion planning in environments with multiple uncontrolled agents. A predictive model of the uncontrolled agents is trained to predict all possible trajectories within a short horizon based on the scenario. The proposed approach is demonstrated in simulation in a scenario emulating autonomous highway driving.
arXiv Detail & Related papers (2020-11-06T20:37:15Z)
What-If Motion Prediction for Autonomous Driving [58.338520347197765]
Viable solutions must account for both the static geometric context, such as road lanes, and dynamic social interactions arising from multiple actors. We propose a recurrent graph-based attentional approach with interpretable geometric (actor-lane) and social (actor-actor) relationships. Our model can produce diverse predictions conditioned on hypothetical or "what-if" road lanes and multi-actor interactions.
arXiv Detail & Related papers (2020-08-24T17:49:30Z)
Multimodal Deep Generative Models for Trajectory Prediction: A Conditional Variational Autoencoder Approach [34.70843462687529]
We provide a self-contained tutorial on a conditional variational autoencoder approach to human behavior prediction. The goals of this tutorial paper are to review and build a taxonomy of state-of-the-art methods in human behavior prediction.
arXiv Detail & Related papers (2020-08-10T03:18:27Z)
The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules. In this paper we propose to incorporate structured priors as a loss function. We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
Social-WaGDAT: Interaction-aware Trajectory Prediction via Wasserstein Graph Double-Attention Network [29.289670231364788]
In this paper, we propose a generic generative neural system for multi-agent trajectory prediction. We also employ an efficient kinematic constraint layer applied to vehicle trajectory prediction. The proposed system is evaluated on three public benchmark datasets for trajectory prediction.
arXiv Detail & Related papers (2020-02-14T20:11:13Z)
Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data [37.176411554794214]
Reasoning about human motion is an important prerequisite to safe and socially-aware robotic navigation. We present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents. We demonstrate its performance on several challenging real-world trajectory forecasting datasets.
arXiv Detail & Related papers (2020-01-09T16:47:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.