Related papers: Encoding Agent Trajectories as Representations with Sequence Transformers

Encoding Agent Trajectories as Representations with Sequence Transformers

URL: http://arxiv.org/abs/2410.09204v1
Date: Fri, 11 Oct 2024 19:18:47 GMT
Title: Encoding Agent Trajectories as Representations with Sequence Transformers
Authors: Athanasios Tsiligkaridis, Nicholas Kalinowski, Zhongheng Li, Elizabeth Hou,
Abstract summary: We propose a model for representing high dimensional trajectories with neural-based network architecture. Similar to language models, our Transformer Sequence for Agent temporal Representations (STARE) model can learn representations and structure in trajectory data. We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings.
Score: 0.4999814847776097
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatiotemporal data faces many analogous challenges to natural language text including the ordering of locations (words) in a sequence, long range dependencies between locations, and locations having multiple meanings. In this work, we propose a novel model for representing high dimensional spatiotemporal trajectories as sequences of discrete locations and encoding them with a Transformer-based neural network architecture. Similar to language models, our Sequence Transformer for Agent Representation Encodings (STARE) model can learn representations and structure in trajectory data through both supervisory tasks (e.g., classification), and self-supervisory tasks (e.g., masked modelling). We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings that are useful for many downstream tasks including discriminating between labels and indicating similarity between locations. Using these encodings, we also learn relationships between agents and locations present in spatiotemporal data.

Related papers

Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that mimic how neurons in the brain communicate through discrete spikes.<n>We introduce several strategies to approximate relative positional encoding (RPE) in spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
Relative Representations of Latent Spaces enable Efficient Semantic Channel Equalization [11.052047963214006]
We present a novel semantic equalization algorithm that enables communication between agents with different languages without additional retraining. Our numerical results show the effectiveness of the proposed approach allowing seamless communication between agents with radically different models.
arXiv Detail & Related papers (2024-11-29T14:08:48Z)
How transformers learn structured data: insights from hierarchical filtering [2.7784685368355744]
We introduce a hierarchical filtering procedure for generative models of sequences on trees. We provide evidence that vanilla encoder-only transformers can approximate the exact inference algorithm when trained on root classification. We find clear evidence of a reconstruction of correlations on successive length scales corresponding to the various levels of the hierarchy.
arXiv Detail & Related papers (2024-08-27T15:23:09Z)
Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders [5.037881619912574]
In this paper, we investigate latent space separation methods for structural syntactic injection in Transformer-based VAEs. Specifically, we explore how syntactic structures can be leveraged in the encoding stage through the integration of graph-based and sequential models. Our empirical evaluation, carried out on natural language sentences and mathematical expressions, reveals that the proposed end-to-end VAE architecture can result in a better overall organisation of the latent space.
arXiv Detail & Related papers (2023-11-14T22:47:23Z)
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples. Our memory models the distribution of past keys and values through the definition of prototype vectors. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks. The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme. To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z)
Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z)
TransVG: End-to-End Visual Grounding with Transformers [102.11922622103613]
We present a transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to an image. We show that the complex fusion modules can be replaced by a simple stack of transformer encoder layers with higher performance.
arXiv Detail & Related papers (2021-04-17T13:35:24Z)
VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs [0.0]
We present a conditional generative model for synthesizing semantically rich images simultaneously conditioned on a pixellevel (PLC) and a featurelevel condition (FLC) Experiments on a GPS dataset show that the proposed model can accurately generate various forms of macroscopic aggregates across different geographic locations.
arXiv Detail & Related papers (2020-12-08T03:46:19Z)
Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space. We present our approach of constructing analogy datasets in terms of words, phrases and sentences. We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks [22.022759283770377]
We propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video. A temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours. The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse.
arXiv Detail & Related papers (2020-02-03T14:34:28Z)
Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. We combine the resulting representations with global scene information for accurately predicting visual saliency. Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.