Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
- URL: http://arxiv.org/abs/2505.23612v1
- Date: Thu, 29 May 2025 16:19:59 GMT
- Title: Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
- Authors: Jianbo Zhao, Taiyu Ban, Xiyang Wang, Qibin Zhou, Hangning Zhou, Zhihao Liu, Mu Yang, Lei Liu, Bin Li,
- Abstract summary: Controllable trajectory generation is crucial for autonomous driving systems.<n>Existing frameworks rely on invariant meta-actions assigned over fixed future time intervals.<n>We introduce Autoregressive Meta-Actions, an approach integrated into autoregressive trajectory generation frameworks.
- Score: 10.123353592943968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllable trajectory generation guided by high-level semantic decisions, termed meta-actions, is crucial for autonomous driving systems. A significant limitation of existing frameworks is their reliance on invariant meta-actions assigned over fixed future time intervals, causing temporal misalignment with the actual behavior trajectories. This misalignment leads to irrelevant associations between the prescribed meta-actions and the resulting trajectories, disrupting task coherence and limiting model performance. To address this challenge, we introduce Autoregressive Meta-Actions, an approach integrated into autoregressive trajectory generation frameworks that provides a unified and precise definition for meta-action-conditioned trajectory prediction. Specifically, We decompose traditional long-interval meta-actions into frame-level meta-actions, enabling a sequential interplay between autoregressive meta-action prediction and meta-action-conditioned trajectory generation. This decomposition ensures strict alignment between each trajectory segment and its corresponding meta-action, achieving a consistent and unified task formulation across the entire trajectory span and significantly reducing complexity. Moreover, we propose a staged pre-training process to decouple the learning of basic motion dynamics from the integration of high-level decision control, which offers flexibility, stability, and modularity. Experimental results validate our framework's effectiveness, demonstrating improved trajectory adaptivity and responsiveness to dynamic decision-making scenarios. We provide the video document and dataset, which are available at https://arma-traj.github.io/.
Related papers
- Optimization-Guided Diffusion for Interactive Scene Generation [52.23368750264419]
We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling.<n>We show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes.<n>Our approach can also generate $5times$ more near-collision frames with a time-to-collision under three seconds.
arXiv Detail & Related papers (2025-12-08T15:56:18Z) - Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning [2.034091340570242]
This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture.<n>Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes.
arXiv Detail & Related papers (2025-11-14T15:29:46Z) - Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving [18.239343348322134]
We propose CATG, a novel planning framework that leverages Constrained Flow Matching.<n>CatG explicitly models the flow matching process, which inherentlys mode collapse.<n>CatG parameterizes driving aggressiveness as a control signal during generation, enabling precise manipulation of trajectory style.
arXiv Detail & Related papers (2025-10-30T09:24:34Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - A Trajectory Generator for High-Density Traffic and Diverse Agent-Interaction Scenarios [37.38654549322757]
We propose a novel trajectory generation framework that simultaneously enhances scenarios density and enriches behavioral diversity.<n>Our method significantly improves both agent density and behavior diversity, while preserving motion realism and scenario-level safety.<n>Our synthetic data also benefits downstream trajectory prediction models and enhances performance in challenging high-density scenarios.
arXiv Detail & Related papers (2025-10-03T00:12:18Z) - Steerable Adversarial Scenario Generation through Test-Time Preference Alignment [58.37104890690234]
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems.<n>We introduce a new framework named textbfSteerable textbfAdversarial scenario textbfGEnerator (SAGE)<n>SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining.
arXiv Detail & Related papers (2025-09-24T13:27:35Z) - ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving [64.12414815634847]
Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge.<n>We propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer.
arXiv Detail & Related papers (2025-08-15T12:06:55Z) - DeMo++: Motion Decoupling for Autonomous Driving [41.6423398623095]
We propose DeMo++, a framework that decouples motion intentions into two distinct components.<n>We introduce a cross-scene trajectory interaction mechanism to explore the relationships between motions in adjacent scenes.<n>DeMo++ achieves state-of-the-art performance across various benchmarks, including motion forecasting (Argoverse 2 and nuScenes), motion planning (nuPlan), and end-to-end planning (SIM)
arXiv Detail & Related papers (2025-07-23T09:11:25Z) - ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture [4.190790144182306]
It is acknowledged that human drivers dynamically adjust initial driving decisions based on assumptions about the intentions surrounding vehicles.<n>Motivated by human driving behaviors, this paper proposes ILNet, a multi-agent trajectory prediction method with Inverse Learning (IL) attention and Dynamic Anchor SelectionDAS (DAS) module.<n> Experimental results show that the ILNet achieves state-of-the-art performance on the INTERACTION and Argoverse motion forecasting datasets.
arXiv Detail & Related papers (2025-07-09T04:18:01Z) - PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning [5.247557449370603]
ProMoGen is a novel framework that integrates trajectory guidance with sparse anchor motion control.<n>ProMoGen supports both dual and single control paradigms within a unified training process.<n>Our approach seamlessly integrates personalized motion with structured guidance, significantly outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-04-23T13:51:42Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.<n>Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.<n>Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - DeMo: Decoupling Motion Forecasting into Directional Intentions and Dynamic States [6.856351850183536]
We introduce DeMo, a framework that decouples multi-modal trajectory queries into two types.
By leveraging this format, we separately optimize the multi-modality and dynamic evolutionary properties of trajectories.
We additionally introduce combined Attention and Mamba techniques for global information aggregation and state sequence modeling.
arXiv Detail & Related papers (2024-10-08T12:27:49Z) - Residual Chain Prediction for Autonomous Driving Path Planning [5.139918355140954]
Residual Chain Loss dynamically adjusts the loss calculation process to enhance the temporal dependency and accuracy of predicted path points.
Our findings highlight the potential of Residual Chain Loss to revolutionize planning component of autonomous driving systems.
arXiv Detail & Related papers (2024-04-08T11:43:40Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - DESTINE: Dynamic Goal Queries with Temporal Transductive Alignment for
Trajectory Prediction [8.25651323214656]
We propose Dynamic goal quErieS with temporal Transductive alIgNmEnt (DESTINE) method.
We show that our method achieves state-of-the-art performance on various metrics.
arXiv Detail & Related papers (2023-10-11T12:41:32Z) - MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and
Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions.
In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges.
The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries.
We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z) - An Adaptive Fuzzy Reinforcement Learning Cooperative Approach for the
Autonomous Control of Flock Systems [4.961066282705832]
This work introduces an adaptive distributed robustness technique for the autonomous control of flock systems.
Its relatively flexible structure is based on online fuzzy reinforcement learning schemes which simultaneously target a number of objectives.
In addition to its resilience in the face of dynamic disturbances, the algorithm does not require more than the agent position as a feedback signal.
arXiv Detail & Related papers (2023-03-17T13:07:35Z) - Motion Transformer with Global Intention Localization and Local Movement
Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement.
MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z) - Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures.
We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view.
We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.