Scalable Self-Supervised Representation Learning from Spatiotemporal
Motion Trajectories for Multimodal Computer Vision
- URL: http://arxiv.org/abs/2210.03289v1
- Date: Fri, 7 Oct 2022 02:41:02 GMT
- Title: Scalable Self-Supervised Representation Learning from Spatiotemporal
Motion Trajectories for Multimodal Computer Vision
- Authors: Swetava Ganguli, C. V. Krishnakumar Iyer, Vipul Pandey
- Abstract summary: We propose a self-supervised, unlabeled method for learning representations of geographic locations from GPS trajectories.
We show that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance as measured using area under precision-recall curve (AUPRC) metric.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised representation learning techniques utilize large datasets
without semantic annotations to learn meaningful, universal features that can
be conveniently transferred to solve a wide variety of downstream supervised
tasks. In this work, we propose a self-supervised method for learning
representations of geographic locations from unlabeled GPS trajectories to
solve downstream geospatial computer vision tasks. Tiles resulting from a
raster representation of the earth's surface are modeled as nodes on a graph or
pixels of an image. GPS trajectories are modeled as allowed Markovian paths on
these nodes. A scalable and distributed algorithm is presented to compute
image-like representations, called reachability summaries, of the spatial
connectivity patterns between tiles and their neighbors implied by the observed
Markovian paths. A convolutional, contractive autoencoder is trained to learn
compressed representations, called reachability embeddings, of reachability
summaries for every tile. Reachability embeddings serve as task-agnostic,
feature representations of geographic locations. Using reachability embeddings
as pixel representations for five different downstream geospatial tasks, cast
as supervised semantic segmentation problems, we quantitatively demonstrate
that reachability embeddings are semantically meaningful representations and
result in 4-23% gain in performance, as measured using area under the
precision-recall curve (AUPRC) metric, when compared to baseline models that
use pixel representations that do not account for the spatial connectivity
between tiles. Reachability embeddings transform sequential, spatiotemporal
mobility data into semantically meaningful tensor representations that can be
combined with other sources of imagery and are designed to facilitate
multimodal learning in geospatial computer vision.
Related papers
- Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models [27.316692263196277]
MVTraj is a novel multi-view modeling method for trajectory representation learning.
It integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data.
Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views.
arXiv Detail & Related papers (2024-10-17T03:56:12Z) - Temporal Embeddings: Scalable Self-Supervised Temporal Representation
Learning from Spatiotemporal Data for Multimodal Computer Vision [1.4127889233510498]
A novel approach is proposed to stratify landscape based on mobility activity time series.
The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling.
arXiv Detail & Related papers (2023-10-16T02:53:29Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor
Points [15.953570826460869]
Establishing dense correspondence between two images is a fundamental computer vision problem.
We introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points.
Our method advances the state-of-the-art of correspondence learning on most benchmarks.
arXiv Detail & Related papers (2021-12-13T18:59:30Z) - Reachability Embeddings: Scalable Self-Supervised Representation
Learning from Markovian Trajectories for Geospatial Computer Vision [0.0]
We propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories.
A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries.
We show that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance.
arXiv Detail & Related papers (2021-10-24T20:10:22Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks.
We propose novel training methods that exploit the spatially aligned structure of remote sensing data.
Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
The graph reasoning is directly performed in the original feature space organized as a spatial pyramid.
We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z) - Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [86.45526827323954]
Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training.
We propose an iterative algorithm to learn such pairwise relations.
We show that the proposed algorithm performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2020-02-19T10:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.