LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction
- URL: http://arxiv.org/abs/2507.04634v1
- Date: Mon, 07 Jul 2025 03:33:14 GMT
- Title: LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction
- Authors: Yixin Yan, Yang Li, Yuanfan Wang, Xiaozhou Zhou, Beihao Xia, Manjiang Hu, Hongmao Qin,
- Abstract summary: We propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for trajectory prediction.<n>We also build a Motion State to incorporate high-order motion state attributes, such as acceleration, jerk, heading, etc.<n>Experiment results show that our method outperforms the baseline HiVT-64, reducing the minADE by approximately 4.35%, the minFDE by 8.74%, and the MR by 20%.
- Score: 6.520837230073969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been challenging to model the complex temporal-spatial dependencies between agents for trajectory prediction. As each state of an agent is closely related to the states of adjacent time steps, capturing the local temporal dependency is beneficial for prediction, while most studies often overlook it. Besides, learning the high-order motion state attributes is expected to enhance spatial interaction modeling, but it is rarely seen in previous works. To address this, we propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for multi-modal trajectory prediction. Specifically, we introduce a Local Trend-Aware Attention mechanism to capture the local temporal dependency by leveraging a convolutional attention mechanism with hierarchical local time boxes. Next, to model the spatial interaction dependency, we build a Motion State Encoder to incorporate high-order motion state attributes, such as acceleration, jerk, heading, etc. To further refine the trajectory prediction, we propose a Lightweight Proposal Refinement Module that leverages Multi-Layer Perceptrons for trajectory embedding and generates the refined trajectories with fewer model parameters. Experiment results on the Argoverse 1 dataset demonstrate that our method outperforms the baseline HiVT-64, reducing the minADE by approximately 4.35%, the minFDE by 8.74%, and the MR by 20%. We also achieve higher accuracy than HiVT-128 with a 68% reduction in model size.
Related papers
- PatchTraj: Unified Time-Frequency Representation Learning via Dynamic Patches for Trajectory Prediction [14.48846131633279]
We propose a dynamic patch-based framework that integrates time-frequency joint modeling for trajectory prediction.<n> Specifically, we decompose the trajectory into raw time sequences and frequency components, and employ dynamic patch partitioning to perform multi-scale segmentation.<n>The resulting enhanced embeddings exhibit strong expressive power, enabling accurate predictions even when using a vanilla architecture.
arXiv Detail & Related papers (2025-07-25T09:55:33Z) - Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM [16.532357621144342]
This paper introduces Trajectory Mamba, a novel efficient trajectory prediction framework based on the selective state-space model (SSM)<n>To address the potential reduction in prediction accuracy resulting from modifications to the attention mechanism, we propose a joint polyline encoding strategy.<n>Our model achieves state-of-the-art results in terms of inference speed and parameter efficiency on both the Argoverse 1 and Argoverse 2 datasets.
arXiv Detail & Related papers (2025-03-13T21:31:12Z) - Post-interactive Multimodal Trajectory Prediction for Autonomous Driving [10.93007749660849]
We propose a Transformer for multimodal trajectory prediction, i.e., Pioformer.<n>It explicitly extracts the post-action features to enhance the prediction accuracy.<n>Our model has reduced the prediction errors by 4.4%, 8.4%, 14.4%, 5.7% regarding metrics minADE6, minFDE6, MR6, and brier-minFDE6, respectively.
arXiv Detail & Related papers (2025-03-12T13:10:09Z) - AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for
Traffic Flow Prediction [78.05103666987655]
spatial-temporal Graph Neural Network (GNN) models have emerged as one of the most promising methods to solve this problem.
We propose a novel propagation delay-aware dynamic long-range transFormer, namely PDFormer, for accurate traffic flow prediction.
Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2023-01-19T08:42:40Z) - Multi View Spatial-Temporal Model for Travel Time Estimation [14.591364075326984]
We propose a Multi-View Spatial-Temporal Model (MVSTM) to capture the dependence of spatial-temporal and trajectory.
Specifically, we use graph2vec to model the spatial view, dual-channel temporal module to model the trajectory view, and structural embedding to model the traffic semantics.
Experiments on large-scale taxi trajectory data show that our approach is more effective than the novel method.
arXiv Detail & Related papers (2021-09-15T16:11:18Z) - SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory
Prediction [64.16212996247943]
We present a Sparse Graph Convolution Network(SGCN) for pedestrian trajectory prediction.
Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians.
visualizations indicate that our method can capture adaptive interactions between pedestrians and their effective motion tendencies.
arXiv Detail & Related papers (2021-04-04T03:17:42Z) - GraphTCN: Spatio-Temporal Interaction Modeling for Human Trajectory
Prediction [5.346782918364054]
We propose a novel CNN-based spatial-temporal graph framework GraphCNT to support more efficient and accurate trajectory predictions.
In contrast to conventional models, both the spatial and temporal modeling of our model are computed within each local time window.
Our model achieves better performance in terms of both efficiency and accuracy as compared with state-of-the-art models on various trajectory prediction benchmark datasets.
arXiv Detail & Related papers (2020-03-16T12:56:12Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z) - Spatial-Temporal Transformer Networks for Traffic Flow Forecasting [74.76852538940746]
We propose a novel paradigm of Spatial-Temporal Transformer Networks (STTNs) to improve the accuracy of long-term traffic forecasting.
Specifically, we present a new variant of graph neural networks, named spatial transformer, by dynamically modeling directed spatial dependencies.
The proposed model enables fast and scalable training over a long range spatial-temporal dependencies.
arXiv Detail & Related papers (2020-01-09T10:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.