Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models
- URL: http://arxiv.org/abs/2410.13196v2
- Date: Fri, 18 Oct 2024 08:33:19 GMT
- Title: Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models
- Authors: Tangwen Qian, Junhe Li, Yile Chen, Gao Cong, Tao Sun, Fei Wang, Yongjun Xu,
- Abstract summary: MVTraj is a novel multi-view modeling method for trajectory representation learning.
It integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data.
Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views.
- Score: 27.316692263196277
- License:
- Abstract: Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gaining deeper insights into movement patterns across different geospatial contexts. To this end, we propose MVTraj, a novel multi-view modeling method for trajectory representation learning. MVTraj integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. To align the learning process across multiple views, we utilize GPS trajectories as a bridge and employ self-supervised pretext tasks to capture and distinguish movement patterns across different spatial views. Following this, we treat trajectories from different views as distinct modalities and apply a hierarchical cross-modal interaction module to fuse the representations, thereby enriching the knowledge derived from multiple sources. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views, validating its effectiveness and practical utility in spatio-temporal modeling.
Related papers
- T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation [6.844357745770191]
Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications.
We propose T-JEPA, a self-supervised trajectory similarity method employing Joint-Embedding Predictive Architecture (JEPA) to enhance trajectory representation learning.
arXiv Detail & Related papers (2024-06-13T09:51:51Z) - VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning [9.325787573209201]
Trajectory similarity search plays an essential role in autonomous driving.
We propose VeTraSS -- an end-to-end pipeline for Vehicle Trajectory Similarity Search.
arXiv Detail & Related papers (2024-04-11T06:19:55Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - More Than Routing: Joint GPS and Route Modeling for Refine Trajectory
Representation Learning [26.630640299709114]
We propose Joint GPS and Route Modelling based on self-supervised technology, namely JGRM.
We develop two encoders, each tailored to capture representations of route and GPS trajectories respectively.
The representations from the two modalities are fed into a shared transformer for inter-modal information interaction.
arXiv Detail & Related papers (2024-02-25T18:27:25Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - Continual Vision-Language Representation Learning with Off-Diagonal
Information [112.39419069447902]
Multi-modal contrastive learning frameworks like CLIP typically require a large amount of image-text samples for training.
This paper discusses the feasibility of continual CLIP training using streaming data.
arXiv Detail & Related papers (2023-05-11T08:04:46Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - DouFu: A Double Fusion Joint Learning Method For Driving Trajectory
Representation [13.321587117066166]
We propose a novel multimodal fusion model, DouFu, for trajectory representation joint learning.
We first design movement, route, and global features generated from the trajectory data and urban functional zones.
With the global semantic feature, DouFu produces a comprehensive embedding for each trajectory.
arXiv Detail & Related papers (2022-05-05T07:43:35Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.