Related papers: Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent

Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent

URL: http://arxiv.org/abs/2311.18307v1
Date: Thu, 30 Nov 2023 07:25:24 GMT
Title: Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent
Authors: Yuxiao Chen, Sander Tonkens, and Marco Pavone
Abstract summary: We present Categorical Traffic Transformer (CTT), a traffic model that outputs both continuous trajectory predictions and tokenized categorical predictions. The most outstanding feature of CTT is its fully interpretable latent space, which enables direct supervision of the latent variable from the ground truth. As a result, CTT can generate diverse behaviors conditioned on different latent modes with semantic meanings while beating SOTA on prediction accuracy.
Score: 17.14501241048221
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adept traffic models are critical to both planning and closed-loop simulation for autonomous vehicles (AV), and key design objectives include accuracy, diverse multimodal behaviors, interpretability, and downstream compatibility. Recently, with the advent of large language models (LLMs), an additional desirable feature for traffic models is LLM compatibility. We present Categorical Traffic Transformer (CTT), a traffic model that outputs both continuous trajectory predictions and tokenized categorical predictions (lane modes, homotopies, etc.). The most outstanding feature of CTT is its fully interpretable latent space, which enables direct supervision of the latent variable from the ground truth during training and avoids mode collapse completely. As a result, CTT can generate diverse behaviors conditioned on different latent modes with semantic meanings while beating SOTA on prediction accuracy. In addition, CTT's ability to input and output tokens enables integration with LLMs for common-sense reasoning and zero-shot generalization.

Related papers

Prototype Transformer: Towards Language Model Architectures Interpretable by Design [37.30649990861446]
We introduce the Prototype Transformer (ProtoT) -- an autoregressive LM architecture based on prototypes.<n>ProtoT works by means of two-way communication between the input sequence and the prototypes.<n>It provides the potential to interpret the model's reasoning and allow for targeted edits of its behavior.
arXiv Detail & Related papers (2026-02-12T11:43:39Z)
Test time training enhances in-context learning of nonlinear functions [51.56484100374058]
Test-time training (TTT) enhances model performance by explicitly updating designated parameters prior to each prediction.<n>We investigate the combination of TTT with in-context learning (ICL), where the model is given a few examples from the target distribution at inference time.
arXiv Detail & Related papers (2025-09-30T03:56:44Z)
ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Markov Regime-Switching Intelligent Driver Model for Interpretable Car-Following Behavior [19.229274803939983]
We introduce a regime-switching framework that allows driving behavior to be governed by different IDM parameter sets.<n>We instantiate the framework using a Factorial Hidden Markov Model with IDM dynamics.
arXiv Detail & Related papers (2025-06-17T17:55:42Z)
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation [94.84458417662404]
LangTraj is a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios. By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors. LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation.
arXiv Detail & Related papers (2025-04-15T17:14:06Z)
Sparse Prototype Network for Explainable Pedestrian Behavior Prediction [60.80524827122901]
We present Sparse Prototype Network (SPN), an explainable method designed to simultaneously predict a pedestrian's future action, trajectory, and pose. Regularized by mono-semanticity and clustering constraints, the prototypes learn consistent and human-understandable features.
arXiv Detail & Related papers (2024-10-16T03:33:40Z)
MSTF: Multiscale Transformer for Incomplete Trajectory Prediction [30.152217860860464]
We propose an end-to-end framework, termed Multiscale Transformer (MSTF), meticulously crafted for incomplete trajectory prediction. MSTF integrates a Multiscale Attention Head (MAH) and an Information Increment-based Pattern Adaptive (IIPA) module. We evaluate our proposed MSTF model using two large-scale real-world datasets.
arXiv Detail & Related papers (2024-07-08T07:10:17Z)
Towards Explainable Traffic Flow Prediction with Large Language Models [36.86937188565623]
We propose a Traffic flow Prediction model based on Large Language Models (LLMs) to generate explainable traffic predictions. By transferring multi-modal traffic data into natural language descriptions, xTP-LLM captures complex time-series patterns and external factors from comprehensive traffic data. Empirically, xTP-LLM shows competitive accuracy compared with deep learning baselines, while providing an intuitive and reliable explanation for predictions.
arXiv Detail & Related papers (2024-04-03T07:14:15Z)
LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models [8.624969693477448]
Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. We propose LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models.
arXiv Detail & Related papers (2024-03-27T08:34:55Z)
AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction. Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations. We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z)
Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting [11.106812447960186]
We introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT) CDT integrates information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left.
arXiv Detail & Related papers (2024-02-06T13:16:54Z)
Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z)
TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model. We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving. Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z)
Guided Conditional Diffusion for Controllable Traffic Simulation [42.198185904248994]
Controllable and realistic traffic simulation is critical for developing and verifying autonomous vehicles. Data-driven approaches generate realistic and human-like behaviors, improving transfer from simulated to real-world traffic. We develop a conditional diffusion model for controllable traffic generation (CTG) that allows users to control desired properties of trajectories at test time.
arXiv Detail & Related papers (2022-10-31T14:44:59Z)
Control-Aware Prediction Objectives for Autonomous Driving [78.19515972466063]
We present control-aware prediction objectives (CAPOs) to evaluate the downstream effect of predictions on control without requiring the planner be differentiable. We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories.
arXiv Detail & Related papers (2022-04-28T07:37:21Z)
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction. multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.