Related papers: PatchTraj: Unified Time-Frequency Representation Learning via Dynamic Patches for Trajectory Prediction

PatchTraj: Unified Time-Frequency Representation Learning via Dynamic Patches for Trajectory Prediction

URL: http://arxiv.org/abs/2507.19119v3
Date: Thu, 31 Jul 2025 15:04:27 GMT
Title: PatchTraj: Unified Time-Frequency Representation Learning via Dynamic Patches for Trajectory Prediction
Authors: Yanghong Liu, Xingping Dong, Ming Li, Weixing Zhang, Yidong Lou,
Abstract summary: We propose a dynamic patch-based framework that integrates time-frequency joint modeling for trajectory prediction.<n> Specifically, we decompose the trajectory into raw time sequences and frequency components, and employ dynamic patch partitioning to perform multi-scale segmentation.<n>The resulting enhanced embeddings exhibit strong expressive power, enabling accurate predictions even when using a vanilla architecture.
Score: 14.48846131633279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pedestrian trajectory prediction is crucial for autonomous driving and robotics. While existing point-based and grid-based methods expose two main limitations: insufficiently modeling human motion dynamics, as they fail to balance local motion details with long-range spatiotemporal dependencies, and the time representations lack interaction with their frequency components in jointly modeling trajectory sequences. To address these challenges, we propose PatchTraj, a dynamic patch-based framework that integrates time-frequency joint modeling for trajectory prediction. Specifically, we decompose the trajectory into raw time sequences and frequency components, and employ dynamic patch partitioning to perform multi-scale segmentation, capturing hierarchical motion patterns. Each patch undergoes adaptive embedding with scale-aware feature extraction, followed by hierarchical feature aggregation to model both fine-grained and long-range dependencies. The outputs of the two branches are further enhanced via cross-modal attention, facilitating complementary fusion of temporal and spectral cues. The resulting enhanced embeddings exhibit strong expressive power, enabling accurate predictions even when using a vanilla Transformer architecture. Extensive experiments on ETH-UCY, SDD, NBA, and JRDB datasets demonstrate that our method achieves state-of-the-art performance. Notably, on the egocentric JRDB dataset, PatchTraj attains significant relative improvements of 26.7% in ADE and 17.4% in FDE, underscoring its substantial potential in embodied intelligence.

Related papers

LTMSformer: A Local Trend-Aware Attention and Motion State Encoding Transformer for Multi-Agent Trajectory Prediction [6.520837230073969]
We propose a lightweight framework, LTMSformer, to extract temporal-spatial interaction features for trajectory prediction.<n>We also build a Motion State to incorporate high-order motion state attributes, such as acceleration, jerk, heading, etc.<n>Experiment results show that our method outperforms the baseline HiVT-64, reducing the minADE by approximately 4.35%, the minFDE by 8.74%, and the MR by 20%.
arXiv Detail & Related papers (2025-07-07T03:33:14Z)
Learning Spatio-Temporal Dynamics for Trajectory Recovery via Time-Aware Transformer [9.812530969395906]
In real-world applications, GPS trajectories often suffer from low sampling rates, with large and irregular intervals between consecutive points.<n>This paper addresses the task of map-constrained trajectory recovery, aiming to enhance trajectory sampling rates.
arXiv Detail & Related papers (2025-05-20T03:09:17Z)
Electromyography-Based Gesture Recognition: Hierarchical Feature Extraction for Enhanced Spatial-Temporal Dynamics [0.7083699704958353]
We propose a lightweight squeeze-excitation deep learning-based multi stream spatial temporal dynamics time-varying feature extraction approach.<n>The proposed model was tested on the Ninapro DB2, DB4, and DB5 datasets, achieving accuracy rates of 96.41%, 92.40%, and 93.34%, respectively.
arXiv Detail & Related papers (2025-04-04T07:11:12Z)
Real-Time Moving Flock Detection in Pedestrian Trajectories Using Sequential Deep Learning Models [1.2289361708127877]
This paper investigates the use of sequential deep learning models, including Recurrent Neural Networks (RNNs), for real-time flock detection in multi-pedestrian trajectories.<n>We validate our method using real-world group movement datasets, demonstrating its robustness across varying sequence lengths and diverse movement patterns.<n>We extend our approach to identify other forms of collective motion, such as convoys and swarms, paving the way for more comprehensive multi-agent behavior analysis.
arXiv Detail & Related papers (2025-02-21T07:04:34Z)
ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction [15.624698974735654]
ASTRA (A Scene-aware TRAnsformer-based model for trajectory prediction) is a light-weight pedestrian trajectory forecasting model.<n>We utilise a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a graph-aware transformer encoder for capturing social interactions.
arXiv Detail & Related papers (2025-01-16T23:28:30Z)
Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking. We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features. We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z)
Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z)
Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream. At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank. To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z)
PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction [78.05103666987655]
spatial-temporal Graph Neural Network (GNN) models have emerged as one of the most promising methods to solve this problem. We propose a novel propagation delay-aware dynamic long-range transFormer, namely PDFormer, for accurate traffic flow prediction. Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2023-01-19T08:42:40Z)
Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision. This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)
D2-TPred: Discontinuous Dependency for Trajectory Prediction under Traffic Lights [68.76631399516823]
We present a trajectory prediction approach with respect to traffic lights, D2-TPred, using a spatial dynamic interaction graph (SDG) and a behavior dependency graph (BDG) Our experimental results show that our model achieves more than 20.45% and 20.78% in terms of ADE and FDE, respectively, on VTP-TL.
arXiv Detail & Related papers (2022-07-21T10:19:07Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction [64.16212996247943]
We present a Sparse Graph Convolution Network(SGCN) for pedestrian trajectory prediction. Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians. visualizations indicate that our method can capture adaptive interactions between pedestrians and their effective motion tendencies.
arXiv Detail & Related papers (2021-04-04T03:17:42Z)
Forecast Network-Wide Traffic States for Multiple Steps Ahead: A Deep Learning Approach Considering Dynamic Non-Local Spatial Correlation and Non-Stationary Temporal Dependency [6.019104024723682]
This research studies two particular problems in traffic forecasting: (1) capture the dynamic and non-local spatial correlation between traffic links and (2) model the dynamics of temporal dependency for accurate multiple steps ahead predictions. We propose a deep learning framework named Spatial-Temporal Sequence to Sequence model (STSeq2Seq) to address these issues. This model builds on sequence to sequence (seq2seq) architecture to capture temporal feature and relies on graph convolution for aggregating spatial information.
arXiv Detail & Related papers (2020-04-06T03:40:56Z)
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
A Spatial-Temporal Attentive Network with Spatial Continuity for Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC) First, spatial-temporal attention mechanism is presented to explore the most useful and important information. Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
Spatial-Temporal Transformer Networks for Traffic Flow Forecasting [74.76852538940746]
We propose a novel paradigm of Spatial-Temporal Transformer Networks (STTNs) to improve the accuracy of long-term traffic forecasting. Specifically, we present a new variant of graph neural networks, named spatial transformer, by dynamically modeling directed spatial dependencies. The proposed model enables fast and scalable training over a long range spatial-temporal dependencies.
arXiv Detail & Related papers (2020-01-09T10:21:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.