SEPT: Towards Efficient Scene Representation Learning for Motion
Prediction
- URL: http://arxiv.org/abs/2309.15289v4
- Date: Tue, 19 Dec 2023 14:14:22 GMT
- Title: SEPT: Towards Efficient Scene Representation Learning for Motion
Prediction
- Authors: Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, Shengbo Eben Li
- Abstract summary: This paper presents SEPT, a modeling framework that leverages self-supervised learning to develop powerful models for complex traffic scenes.
experiments demonstrate that SEPT, without elaborate architectural design or feature engineering, achieves state-of-the-art performance on the Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
- Score: 19.111948522155004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motion prediction is crucial for autonomous vehicles to operate safely in
complex traffic environments. Extracting effective spatiotemporal relationships
among traffic elements is key to accurate forecasting. Inspired by the
successful practice of pretrained large language models, this paper presents
SEPT, a modeling framework that leverages self-supervised learning to develop
powerful spatiotemporal understanding for complex traffic scenes. Specifically,
our approach involves three masking-reconstruction modeling tasks on scene
inputs including agents' trajectories and road network, pretraining the scene
encoder to capture kinematics within trajectory, spatial structure of road
network, and interactions among roads and agents. The pretrained encoder is
then finetuned on the downstream forecasting task. Extensive experiments
demonstrate that SEPT, without elaborate architectural design or manual feature
engineering, achieves state-of-the-art performance on the Argoverse 1 and
Argoverse 2 motion forecasting benchmarks, outperforming previous methods on
all main metrics by a large margin.
Related papers
- Large Language Models Powered Context-aware Motion Prediction in Autonomous Driving [13.879945446114956]
We utilize Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks.
Considering the cost associated with LLMs, we propose a cost-effective deployment strategy.
Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.
arXiv Detail & Related papers (2024-03-17T02:06:49Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and
Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions.
In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges.
The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries.
We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z) - Traj-MAE: Masked Autoencoders for Trajectory Prediction [69.7885837428344]
Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers.
We propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environment.
Our experimental results in both multi-agent and single-agent settings demonstrate that Traj-MAE achieves competitive results with state-of-the-art methods.
arXiv Detail & Related papers (2023-03-12T16:23:27Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z) - Physically constrained short-term vehicle trajectory forecasting with
naive semantic maps [6.85316573653194]
We propose a model that learns to extract relevant road features from semantic maps as well as general motion of agents.
We show that our model is not only capable of anticipating future motion whilst taking into consideration road boundaries, but can also effectively and precisely predict trajectories for a longer time horizon than initially trained for.
arXiv Detail & Related papers (2020-06-09T09:52:44Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z) - Robust Trajectory Forecasting for Multiple Intelligent Agents in Dynamic
Scene [11.91073327154494]
We present a novel method for robust trajectory forecasting of multiple agents in dynamic scenes.
The proposed method outperforms the state-of-the-art prediction methods in terms of prediction accuracy.
arXiv Detail & Related papers (2020-05-27T02:32:55Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z) - An Effective Dynamic Spatio-temporal Framework with Multi-Source
Information for Traffic Prediction [0.22940141855172028]
The proposed model improves the prediction precision by approximately 3-7 percent on the NYC-Taxi and NYC-Bike datasets.
The experimental results show that the proposed model improves the prediction precision by approximately 3-7 percent on the NYC-Taxi and NYC-Bike datasets.
arXiv Detail & Related papers (2020-05-08T14:23:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.