Related papers: SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction

SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction

URL: http://arxiv.org/abs/2405.15677v3
Date: Fri, 01 Nov 2024 06:19:24 GMT
Title: SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction
Authors: Wei Wu, Xiaoxin Feng, Ziyan Gao, Yuheng Kan,
Abstract summary: We introduce a novel autonomous driving motion generation paradigm that models vectorized map and agent trajectory data into discrete sequence tokens. These tokens are then processed through a decoder-only transformer architecture to train for the next token prediction task. We have collected over 1 billion motion tokens from multiple datasets, validating the model's scalability.
Score: 4.318757942343036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data-driven autonomous driving motion generation tasks are frequently impacted by the limitations of dataset size and the domain gap between datasets, which precludes their extensive application in real-world scenarios. To address this issue, we introduce SMART, a novel autonomous driving motion generation paradigm that models vectorized map and agent trajectory data into discrete sequence tokens. These tokens are then processed through a decoder-only transformer architecture to train for the next token prediction task across spatial-temporal series. This GPT-style method allows the model to learn the motion distribution in real driving scenarios. SMART achieves state-of-the-art performance across most of the metrics on the generative Sim Agents challenge, ranking 1st on the leaderboards of Waymo Open Motion Dataset (WOMD), demonstrating remarkable inference speed. Moreover, SMART represents the generative model in the autonomous driving motion domain, exhibiting zero-shot generalization capabilities: Using only the NuPlan dataset for training and WOMD for validation, SMART achieved a competitive score of 0.72 on the Sim Agents challenge. Lastly, we have collected over 1 billion motion tokens from multiple datasets, validating the model's scalability. These results suggest that SMART has initially emulated two important properties: scalability and zero-shot generalization, and preliminarily meets the needs of large-scale real-time simulation applications. We have released all the code to promote the exploration of models for motion generation in the autonomous driving field. The source code is available at https://github.com/rainmaker22/SMART.

Related papers

InfGen: Scenario Generation as Next Token Group Prediction [49.54222089551598]
InfGen is a scenario generation framework that outputs agent states and trajectories in an autoregressive manner.<n>Experiments demonstrate that InfGen produces realistic, diverse, and adaptive traffic behaviors.
arXiv Detail & Related papers (2025-06-29T16:18:32Z)
Planning with Adaptive World Models for Autonomous Driving [50.4439896514353]
We present nuPlan, a real-world motion planning benchmark that captures multi-agent interactions. We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN) We also present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions.
arXiv Detail & Related papers (2024-06-15T18:53:45Z)
Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs. Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction. We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z)
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction [22.254486248785614]
BehaviorGPT is a homogeneous and fully autoregressive Transformer designed to simulate the sequential behavior of multiple agents. We introduce the Next-Patch Prediction Paradigm (NP3) to mitigate the negative effects of autoregressive modeling. BehaviorGPT won first place in the 2024 Open Sim Agents Challenge with a realism score of 0.7473 and a minADE score of 1.4147.
arXiv Detail & Related papers (2024-05-27T17:28:25Z)
Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z)
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training. We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z)
FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling [20.784555362703294]
We establish a public benchmark dataset for car-following behavior modeling. The benchmark consists of more than 80K car-following events extracted from five public driving datasets. Results show that the deep deterministic policy gradient (DDPG) based model performs competitively with a lower MSE for spacing.
arXiv Detail & Related papers (2023-05-25T08:59:26Z)
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model. We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving. Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition. We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.