Related papers: Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling

Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling

URL: http://arxiv.org/abs/2510.11083v1
Date: Mon, 13 Oct 2025 07:25:13 GMT
Title: Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Authors: Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin Zheng, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, Jingjing Liu,
Abstract summary: Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning.<n>We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme.<n>Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.
Score: 26.71028572181775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.

Related papers

Action-Dynamics Modeling and Cross-Temporal Interaction for Online Action Understanding [23.87664450145037]
Action understanding, encompassing action detection and anticipation, plays a crucial role in numerous practical applications.<n>We propose a novel framework called the State-Specific Model (SSM), designed to unify and enhance both action detection and anticipation tasks.
arXiv Detail & Related papers (2025-10-12T16:10:40Z)
HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction [11.30785902722196]
HeLoFusion is an efficient and scalable encoder for modeling heterogeneous and multi-scale agent interactions.<n>Our work demonstrates that a locality-grounded architecture, which explicitly models multi-scale and heterogeneous interactions, is a highly effective strategy for advancing motion forecasting.
arXiv Detail & Related papers (2025-09-15T09:19:41Z)
ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture [4.190790144182306]
It is acknowledged that human drivers dynamically adjust initial driving decisions based on assumptions about the intentions surrounding vehicles.<n>Motivated by human driving behaviors, this paper proposes ILNet, a multi-agent trajectory prediction method with Inverse Learning (IL) attention and Dynamic Anchor SelectionDAS (DAS) module.<n> Experimental results show that the ILNet achieves state-of-the-art performance on the INTERACTION and Argoverse motion forecasting datasets.
arXiv Detail & Related papers (2025-07-09T04:18:01Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z)
Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are essential for autonomous vehicles to navigate safely and efficiently in dynamic environments.<n>Recent diffusion-based generative models have shown promise in multi-agent trajectory generation, but their slow sampling is less suitable for high-frequency planning tasks.<n>We leverage the consistency model to build a predictive planner that samples from a joint distribution of ego and surrounding agents, conditioned on the ego vehicle's navigational goal.
arXiv Detail & Related papers (2025-02-12T00:26:01Z)
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models [50.38647583839384]
We propose InterDyn, a framework that generates videos of interactive dynamics given an initial frame and a control signal encoding the motion of a driving object or actor.<n>Our key insight is that large video generation models can act as both neurals and implicit physics simulators'', having learned interactive dynamics from large-scale video data.
arXiv Detail & Related papers (2024-12-16T13:57:02Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.<n>DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.<n>Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework. These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z)
Amortized Network Intervention to Steer the Excitatory Point Processes [8.15558505134853]
Excitatory point processes (i.e., event flows) occurring over dynamic graphs provide a fine-grained model to capture how discrete events may spread over time and space. How to effectively steer the event flows by modifying the dynamic graph structures presents an interesting problem, motivated by curbing the spread of infectious diseases. We design an Amortized Network Interventions framework, allowing for the pooling of optimal policies from history and other contexts.
arXiv Detail & Related papers (2023-10-06T11:17:28Z)
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models. VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z)
Convolutions for Spatial Interaction Modeling [9.408751013132624]
We consider the problem of spatial interaction modeling in the context of predicting the motion of actors around autonomous vehicles. We revisit convolutions and show that they can demonstrate comparable performance to graph networks in modeling spatial interactions with lower latency.
arXiv Detail & Related papers (2021-04-15T00:41:30Z)
Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline [85.9210953301628]
Control of traffic signals is fundamental and critical to alleviate traffic congestion in urban areas. Because of the high complexity of modelling the problem, experimental settings of current works are often inconsistent. We propose a novel and strong baseline model based on deep reinforcement learning with the encoder-decoder structure.
arXiv Detail & Related papers (2021-01-24T03:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.