Scaling Laws of Motion Forecasting and Planning -- A Technical Report
- URL: http://arxiv.org/abs/2506.08228v1
- Date: Mon, 09 Jun 2025 20:54:23 GMT
- Title: Scaling Laws of Motion Forecasting and Planning -- A Technical Report
- Authors: Mustafa Baniodeh, Kratarth Goel, Scott Ettinger, Carlos Fuertes, Ari Seff, Tim Shen, Cole Gulino, Chenjie Yang, Ghassen Jerfel, Dokook Choe, Rui Wang, Vinutha Kallem, Sergio Casas, Rami Al-Rfou, Benjamin Sapp, Dragomir Anguelov,
- Abstract summary: We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models.<n>We observe a strong correlation between model training loss and model evaluation metrics.<n>We briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent.
- Score: 23.340801154900387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the dataset size. We also study inference-time compute scaling, where we observe that sampling and clustering the output of smaller models makes them competitive with larger models, up to a crossover point beyond which a larger models becomes more inference-compute efficient. Overall, our experimental results demonstrate that optimizing the training and inference-time scaling properties of motion forecasting and planning models is a key lever for improving their performance to address a wide variety of driving scenarios. Finally, we briefly study the utility of training on general logged driving data of other agents to improve the performance of the ego-agent, an important research area to address the scarcity of robotics data for large capacity models training.
Related papers
- Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies [66.83950068218033]
Scaling Laws demonstrate that scaling model parameters and training data enhances learning performance.<n>Despite its potential to improve performance, the integration of scaling laws into deep reinforcement learning has not been fully realized.<n>This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget.
arXiv Detail & Related papers (2025-08-05T08:03:12Z) - Data Scaling Laws for End-to-End Autonomous Driving [83.85463296830743]
We evaluate the performance of a simple end-to-end driving architecture on internal driving datasets ranging in size from 16 to 8192 hours.<n>Specifically, we investigate how much additional training data is needed to achieve a target performance gain.
arXiv Detail & Related papers (2025-04-06T03:23:48Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - DriveGPT: Scaling Autoregressive Behavior Models for Driving [11.733428769776204]
We present DriveGPT, a scalable behavior model for autonomous driving.<n>We learn a transformer model to predict future agent states as tokens in an autoregressive fashion.<n>We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties.
arXiv Detail & Related papers (2024-12-19T00:06:09Z) - Scaling Sequential Recommendation Models with Transformers [0.0]
We take inspiration from the scaling laws observed in training large language models, and explore similar principles for sequential recommendation.<n> Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application.<n>We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains.
arXiv Detail & Related papers (2024-12-10T15:20:56Z) - A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.<n>We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs [61.13296177652599]
We show that data mixtures that perform well at smaller scales may not retain their advantage at larger scales.<n>We propose AutoScale, a two-stage, scale-aware data composition framework.
arXiv Detail & Related papers (2024-07-29T17:06:30Z) - More Compute Is What You Need [3.184416958830696]
We propose a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models.
We predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
arXiv Detail & Related papers (2024-04-30T12:05:48Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Navigating Scaling Laws: Compute Optimality in Adaptive Model Training [39.96209967632896]
In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data.
We extend the concept of optimality by allowing for an adaptive' model, i.e. a model that can change its shape during training.
arXiv Detail & Related papers (2023-11-06T16:20:28Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.