Generalizing Motion Planners with Mixture of Experts for Autonomous Driving
- URL: http://arxiv.org/abs/2410.15774v2
- Date: Tue, 29 Oct 2024 05:35:41 GMT
- Title: Generalizing Motion Planners with Mixture of Experts for Autonomous Driving
- Authors: Qiao Sun, Huimin Wang, Jiahao Zhan, Fan Nie, Xin Wen, Leimeng Xu, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao,
- Abstract summary: StateTransformer-2 is a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) causal Transformer architecture.
We introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) causal Transformer architecture.
- Score: 38.02032312602382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that many of these approaches produce limited generalization abilities in planning performance due to overly complex designs or training paradigms. In this paper, we review and benchmark previous methods focusing on generalizations. The experimental results indicate that as models are appropriately scaled, many design elements become redundant. We introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) causal Transformer architecture. The MoE backbone addresses modality collapse and reward balancing by expert routing during training. Extensive experiments on the NuPlan dataset show that our method generalizes better than previous approaches across different test sets and closed-loop simulations. Furthermore, we assess its scalability on billions of real-world urban driving scenarios, demonstrating consistent accuracy improvements as both data and model size grow.
Related papers
- Improved visual-information-driven model for crowd simulation and its modular application [4.683197108420276]
Data-driven crowd simulation models offer advantages in enhancing the accuracy and realism of simulations.
It is still an open question to develop data-driven crowd simulation models with strong generalizability.
This paper proposes a data-driven model incorporating a refined visual information extraction method and exit cues to enhance generalizability.
arXiv Detail & Related papers (2025-04-02T07:53:33Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - DriveGPT: Scaling Autoregressive Behavior Models for Driving [11.733428769776204]
We present DriveGPT, a scalable behavior model for autonomous driving.
We learn a transformer model to predict future agent states as tokens in an autoregressive fashion.
We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties.
arXiv Detail & Related papers (2024-12-19T00:06:09Z) - Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model.
Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z) - TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era [2.9052912091435923]
High-Energy Physics experiments are facing a multi-fold data increase with every new iteration.
One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking.
A Machine Learning-assisted solution is expected to provide significant improvements.
arXiv Detail & Related papers (2024-07-09T18:47:25Z) - Planning with Adaptive World Models for Autonomous Driving [50.4439896514353]
Motion planners (MPs) are crucial for safe navigation in complex urban environments.
nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic.
We present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions.
arXiv Detail & Related papers (2024-06-15T18:53:45Z) - AMEND: A Mixture of Experts Framework for Long-tailed Trajectory Prediction [6.724750970258851]
We propose a modular model-agnostic framework for trajectory prediction.
Each expert is trained with a specialized skill with respect to a particular part of the data.
To produce predictions, we utilise a router network that selects the best expert by generating relative confidence scores.
arXiv Detail & Related papers (2024-02-13T02:43:41Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Leveraging the Power of Data Augmentation for Transformer-based Tracking [64.46371987827312]
We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
arXiv Detail & Related papers (2023-09-15T09:18:54Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.