Related papers: TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge

TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge

URL: http://arxiv.org/abs/2506.21618v1
Date: Mon, 23 Jun 2025 08:32:05 GMT
Title: TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge
Authors: Zhiyuan Zhang, Xiaosong Jia, Guanyu Chen, Qifeng Li, Junchi Yan,
Abstract summary: We introduce TrajTok, a trajectory tokenizer for discrete next-token-prediction based behavior generation models.<n>We adopt the tokenizer and loss for the SMART model and reach a superior performance with realism score of 0.7852 on the Open Sim Agents Challenge 2025.
Score: 58.952909068296314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this technical report, we introduce TrajTok, a trajectory tokenizer for discrete next-token-prediction based behavior generation models, which combines data-driven and rule-based methods with better coverage, symmetry and robustness, along with a spatial-aware label smoothing method for cross-entropy loss. We adopt the tokenizer and loss for the SMART model and reach a superior performance with realism score of 0.7852 on the Waymo Open Sim Agents Challenge 2025. We will open-source the code in the future.

Related papers

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis [79.98107530577576]
DisCon is a novel framework that reinterprets discrete tokens as conditional signals rather than generation targets.<n>DisCon achieves a gFID score of 1.38 on ImageNet 256$times $256 generation, outperforming state-of-the-art autoregressive approaches by a clear margin.
arXiv Detail & Related papers (2025-07-02T14:33:52Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Model Predictive Simulation Using Structured Graphical Models and Transformers [4.229560419171488]
We propose an approach to simulating trajectories of multiple interacting agents (road users) based on transformers and probabilistic graphical models (PGMs) We then improve upon these generated trajectories using a PGM, which contains factors which encode prior knowledge. We show that MPS improves upon the MTR baseline, especially in safety critical metrics such as collision rate.
arXiv Detail & Related papers (2024-06-28T03:46:53Z)
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction [22.254486248785614]
BehaviorGPT is a homogeneous and fully autoregressive Transformer designed to simulate the sequential behavior of multiple agents. We introduce the Next-Patch Prediction Paradigm (NP3) to mitigate the negative effects of autoregressive modeling. BehaviorGPT won first place in the 2024 Open Sim Agents Challenge with a realism score of 0.7473 and a minADE score of 1.4147.
arXiv Detail & Related papers (2024-05-27T17:28:25Z)
SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction [4.318757942343036]
We introduce a novel autonomous driving motion generation paradigm that models vectorized map and agent trajectory data into discrete sequence tokens. These tokens are then processed through a decoder-only transformer architecture to train for the next token prediction task. We have collected over 1 billion motion tokens from multiple datasets, validating the model's scalability.
arXiv Detail & Related papers (2024-05-24T16:17:35Z)
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM) CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance. Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z)
Multiverse Transformer: 1st Place Solution for Waymo Open Sim Agents Challenge 2023 [3.4520774137890555]
This report presents our 1st place solution for the Open Sim Agents Challenge (WOSAC) 2023. Our proposed MultiVerse Transformer for Agent simulation (MVTA) effectively leverages transformer-based motion prediction approaches. In order to produce simulations with a high degree of realism, we design novel training and sampling methods.
arXiv Detail & Related papers (2023-06-20T20:01:07Z)
Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport [9.980822222343921]
We propose a novel generative model based on the semi-dual formulation of Unbalanced Optimal Transport (UOT) Unlike OT, UOT relaxes the hard constraint on distribution matching. This approach provides better robustness against outliers, stability during training, and faster convergence. Our model outperforms existing OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 6.36 on CelebA-HQ-256.
arXiv Detail & Related papers (2023-05-24T06:31:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.