Related papers: Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

URL: http://arxiv.org/abs/2410.13196v2
Date: Fri, 18 Oct 2024 08:33:19 GMT
Title: Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models
Authors: Tangwen Qian, Junhe Li, Yile Chen, Gao Cong, Tao Sun, Fei Wang, Yongjun Xu,
Abstract summary: MVTraj is a novel multi-view modeling method for trajectory representation learning. It integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views.
Score: 27.316692263196277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gaining deeper insights into movement patterns across different geospatial contexts. To this end, we propose MVTraj, a novel multi-view modeling method for trajectory representation learning. MVTraj integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. To align the learning process across multiple views, we utilize GPS trajectories as a bridge and employ self-supervised pretext tasks to capture and distinguish movement patterns across different spatial views. Following this, we treat trajectories from different views as distinct modalities and apply a hierarchical cross-modal interaction module to fuse the representations, thereby enriching the knowledge derived from multiple sources. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views, validating its effectiveness and practical utility in spatio-temporal modeling.

Related papers

TrajSceneLLM: A Multimodal Perspective on Semantic GPS Trajectory Analysis [0.0]
We propose TrajSceneLLM, a multimodal perspective for enhancing semantic understanding of GPS trajectories.<n>We validate the proposed framework on Travel Mode Identification (TMI), a critical task for analyzing travel choices and understanding mobility behavior.<n>This semantic enhancement promises significant potential for diverse downstream applications and future research in artificial intelligence.
arXiv Detail & Related papers (2025-06-19T15:31:40Z)
TrajLearn: Trajectory Prediction Learning using Deep Generative Models [4.097342535693401]
Trajectory prediction aims to estimate an entity's future path using its current position and historical movement data. To address these challenges, we introduce TrajLearn, a novel model for trajectory prediction. TrajLearn predicts the next $k$ steps by integrating a customized beam search for exploring multiple potential paths.
arXiv Detail & Related papers (2024-12-30T23:38:52Z)
Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics [0.8655526882770742]
Trajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning. We propose TIGR, a novel model designed to integrate grid and road network modalities while incorporatingtemporal dynamics. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities.
arXiv Detail & Related papers (2024-11-21T10:56:02Z)
T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation [6.844357745770191]
Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications. We propose T-JEPA, a self-supervised trajectory similarity method employing Joint-Embedding Predictive Architecture (JEPA) to enhance trajectory representation learning.
arXiv Detail & Related papers (2024-06-13T09:51:51Z)
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z)
More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning [26.630640299709114]
We propose Joint GPS and Route Modelling based on self-supervised technology, namely JGRM. We develop two encoders, each tailored to capture representations of route and GPS trajectories respectively. The representations from the two modalities are fed into a shared transformer for inter-modal information interaction.
arXiv Detail & Related papers (2024-02-25T18:27:25Z)
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques. We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z)
Continual Vision-Language Representation Learning with Off-Diagonal Information [112.39419069447902]
Multi-modal contrastive learning frameworks like CLIP typically require a large amount of image-text samples for training. This paper discusses the feasibility of continual CLIP training using streaming data.
arXiv Detail & Related papers (2023-05-11T08:04:46Z)
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories. We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z)
DouFu: A Double Fusion Joint Learning Method For Driving Trajectory Representation [13.321587117066166]
We propose a novel multimodal fusion model, DouFu, for trajectory representation joint learning. We first design movement, route, and global features generated from the trajectory data and urban functional zones. With the global semantic feature, DouFu produces a comprehensive embedding for each trajectory.
arXiv Detail & Related papers (2022-05-05T07:43:35Z)
Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet) CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z)
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space. We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities. The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.