Region-Point Joint Representation for Effective Trajectory Similarity Learning
- URL: http://arxiv.org/abs/2511.13125v1
- Date: Mon, 17 Nov 2025 08:28:18 GMT
- Title: Region-Point Joint Representation for Effective Trajectory Similarity Learning
- Authors: Hao Long, Silin Zhou, Lisi Chen, Shuo Shang,
- Abstract summary: textbfRePo is a novel method that encodes textbfRegion-wise and textbfPoint-wise features to capture both spatial context and fine-grained moving patterns.<n>Experiment results show that RePo achieves an average accuracy improvement of 22.2% over SOTA baselines.
- Score: 25.664203648334563
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent learning-based methods have reduced the computational complexity of traditional trajectory similarity computation, but state-of-the-art (SOTA) methods still fail to leverage the comprehensive spectrum of trajectory information for similarity modeling. To tackle this problem, we propose \textbf{RePo}, a novel method that jointly encodes \textbf{Re}gion-wise and \textbf{Po}int-wise features to capture both spatial context and fine-grained moving patterns. For region-wise representation, the GPS trajectories are first mapped to grid sequences, and spatial context are captured by structural features and semantic context enriched by visual features. For point-wise representation, three lightweight expert networks extract local, correlation, and continuous movement patterns from dense GPS sequences. Then, a router network adaptively fuses the learned point-wise features, which are subsequently combined with region-wise features using cross-attention to produce the final trajectory embedding. To train RePo, we adopt a contrastive loss with hard negative samples to provide similarity ranking supervision. Experiment results show that RePo achieves an average accuracy improvement of 22.2\% over SOTA baselines across all evaluation metrics.
Related papers
- TrajDiff: Diffusion Bridge Network with Semantic Alignment for Trajectory Similarity Computation [39.42796455901991]
We propose a novel trajectory similarity computation framework, named TrajDiff.<n>Specifically, TrajDiff relies on cross-attention and an attention score mask mechanism with adaptive fusion.<n>Extensive experiments on three publicly available datasets show that TrajDiff consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-18T21:52:07Z) - Local All-Pair Correspondence for Point Tracking [59.76186266230608]
We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences.
LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.
arXiv Detail & Related papers (2024-07-22T06:49:56Z) - NLP-enabled Trajectory Map-matching in Urban Road Networks using a Transformer-based Encoder-decoder [1.3812010983144802]
This study introduces a data-driven, deep learning-based map-matching framework, formulating the task as machine translation, inspired by NLP.<n>A transformer-based encoder-decoder model learns contextual representations of noisy GPS points to infer trajectory behavior and road structures in an end-to-end manner.<n>Experiments on synthetic trajectories show that this approach outperforms conventional methods by integrating contextual awareness.
arXiv Detail & Related papers (2024-04-18T18:39:23Z) - AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales [45.315661330785275]
We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps.
We tackle two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view.
arXiv Detail & Related papers (2024-04-04T04:12:30Z) - Digging Into Normal Incorporated Stereo Matching [18.849192633442453]
We propose a normal incorporated joint learning framework consisting of two specific modules named non-local disparity propagation(NDP) and affinity-aware residual learning(ARL)
By the time we finished this work, our approach ranked 1st for stereo matching across foreground pixels on the KITTI 2015 dataset and 3rd on the Scene Flow dataset among all the published works.
arXiv Detail & Related papers (2024-02-28T09:01:50Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Mixed Graph Contrastive Network for Semi-Supervised Node Classification [63.924129159538076]
We propose a novel graph contrastive learning method, termed Mixed Graph Contrastive Network (MGCN)<n>In our method, we improve the discriminative capability of the latent embeddings by an unperturbed augmentation strategy and a correlation reduction mechanism.<n>By combining the two settings, we extract rich supervision information from both the abundant nodes and the rare yet valuable labeled nodes for discriminative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z) - Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT)
We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature.
While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z) - TC-Net: Triple Context Network for Automated Stroke Lesion Segmentation [0.5482532589225552]
We propose a new network, Triple Context Network (TC-Net), with the capture of spatial contextual information as the core.
Our network is evaluated on the open dataset ATLAS, achieving the highest score of 0.594, Hausdorff distance of 27.005 mm, and average symmetry surface distance of 7.137 mm.
arXiv Detail & Related papers (2022-02-28T11:12:16Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - DiResNet: Direction-aware Residual Network for Road Extraction in VHR
Remote Sensing Images [12.081877372552606]
We present a direction-aware residual network (DiResNet) that includes three main contributions.
The proposed method has advantages in both overall accuracy and F1-score.
arXiv Detail & Related papers (2020-05-14T19:33:21Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z) - A Rotation-Invariant Framework for Deep Point Cloud Analysis [132.91915346157018]
We introduce a new low-level purely rotation-invariant representation to replace common 3D Cartesian coordinates as the network inputs.
Also, we present a network architecture to embed these representations into features, encoding local relations between points and their neighbors, and the global shape structure.
We evaluate our method on multiple point cloud analysis tasks, including shape classification, part segmentation, and shape retrieval.
arXiv Detail & Related papers (2020-03-16T14:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.