Related papers: Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation

Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation

URL: http://arxiv.org/abs/2512.22605v1
Date: Sat, 27 Dec 2025 14:23:04 GMT
Title: Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation
Authors: Junshu Dai, Yu Wang, Tongya Zheng, Wei Ji, Qinghong Guo, Ji Cao, Jie Song, Canghong Jin, Mingli Song,
Abstract summary: We leverage multi-modal spatial-temporal knowledge to characterize mobility dynamics for the location recommendation task.<n>First, we construct a unified spatial-temporal relational graph (STRG) for multi-modal representation.<n>Second, we design a gating mechanism to fuse spatial-temporal graph representations of different modalities.
Score: 51.00494428978262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The precise prediction of human mobility has produced significant socioeconomic impacts, such as location recommendations and evacuation suggestions. However, existing methods suffer from limited generalization capability: unimodal approaches are constrained by data sparsity and inherent biases, while multi-modal methods struggle to effectively capture mobility dynamics caused by the semantic gap between static multi-modal representation and spatial-temporal dynamics. Therefore, we leverage multi-modal spatial-temporal knowledge to characterize mobility dynamics for the location recommendation task, dubbed as \textbf{M}ulti-\textbf{M}odal \textbf{Mob}ility (\textbf{M}$^3$\textbf{ob}). First, we construct a unified spatial-temporal relational graph (STRG) for multi-modal representation, by leveraging the functional semantics and spatial-temporal knowledge captured by the large language models (LLMs)-enhanced spatial-temporal knowledge graph (STKG). Second, we design a gating mechanism to fuse spatial-temporal graph representations of different modalities, and propose an STKG-guided cross-modal alignment to inject spatial-temporal dynamic knowledge into the static image modality. Extensive experiments on six public datasets show that our proposed method not only achieves consistent improvements in normal scenarios but also exhibits significant generalization ability in abnormal scenarios.

Related papers

Meta Dynamic Graph for Traffic Flow Prediction [4.6060644265855775]
We propose a framework for traffic prediction, called Dynamic Meta Graph (MetaDG)<n>We leverage dynamic graph structures of node representations to explicitly model-temporal dynamics.<n>Extensive experiments on four real-world datasets validate the effectiveness of MetaDG.
arXiv Detail & Related papers (2026-01-15T12:15:54Z)
RainDiff: End-to-end Precipitation Nowcasting Via Token-wise Attention Diffusion [64.49056527678606]
We propose a Token-wise Attention integrated into not only the U-Net diffusion model but also the radar-temporal encoder.<n>Unlike prior approaches, our method integrates attention into the architecture without incurring the high resource cost typical of pixel-space diffusion.<n>Our experiments and evaluations demonstrate that the proposed method significantly outperforms state-of-the-art approaches, robustness local fidelity, generalization, and superior in complex precipitation forecasting scenarios.
arXiv Detail & Related papers (2025-10-16T17:59:13Z)
Spatio-Temporal Multi-Subgraph GCN for 3D Human Motion Prediction [12.766305983943314]
Graph Convolutional Networks (GCNs) have garnered widespread attention in this field for their proficiency in capturing relationships among joints in human motion.<n>We propose the Spatial-Temporal Multi-temporal Multi-temporal Network (STMS-GCN) to capture complex motion in human motion.
arXiv Detail & Related papers (2024-12-31T07:22:39Z)
Multimodal joint prediction of traffic spatial-temporal data with graph sparse attention mechanism and bidirectional temporal convolutional network [25.524351892847257]
We propose a method called Graph Sparse Attention Mechanism with Bidirectional Temporal Convolutional Network (GSABT) for multimodal traffic spatial-temporal joint prediction.<n>We use a multimodal graph multiplied by self-attention weights to capture spatial local features, and then employ the Top-U sparse attention mechanism to obtain spatial global features.<n>We have designed a multimodal joint prediction framework that can be flexibly extended to both spatial and temporal dimensions.
arXiv Detail & Related papers (2024-12-24T12:57:52Z)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z)
Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications. Traditional methods rely on hand-crafted features and machine learning techniques. We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z)
Transformer Inertial Poser: Attention-based Real-time Human Motion Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time. Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z)
A Spatial-Temporal Attentive Network with Spatial Continuity for Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC) First, spatial-temporal attention mechanism is presented to explore the most useful and important information. Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.