Cross Scene Prediction via Modeling Dynamic Correlation using Latent
Space Shared Auto-Encoders
- URL: http://arxiv.org/abs/2003.13930v1
- Date: Tue, 31 Mar 2020 03:08:23 GMT
- Title: Cross Scene Prediction via Modeling Dynamic Correlation using Latent
Space Shared Auto-Encoders
- Authors: Shaochi Hu, Donghao Xu, Huijing Zhao
- Abstract summary: Given a set of unsynchronized history observations of two scenes, the purpose is to learn a cross-scene predictor.
A method is proposed to solve the problem via modeling dynamic correlation using latent space shared auto-encoders.
- Score: 6.530318792830862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work addresses on the following problem: given a set of unsynchronized
history observations of two scenes that are correlative on their dynamic
changes, the purpose is to learn a cross-scene predictor, so that with the
observation of one scene, a robot can onlinely predict the dynamic state of
another. A method is proposed to solve the problem via modeling dynamic
correlation using latent space shared auto-encoders. Assuming that the inherent
correlation of scene dynamics can be represented by shared latent space, where
a common latent state is reached if the observations of both scenes are at an
approximate time, a learning model is developed by connecting two auto-encoders
through the latent space, and a prediction model is built by concatenating the
encoder of the input scene with the decoder of the target one. Simulation
datasets are generated imitating the dynamic flows at two adjacent gates of a
campus, where the dynamic changes are triggered by a common working and
teaching schedule. Similar scenarios can also be found at successive
intersections on a single road, gates of a subway station, etc. Accuracy of
cross-scene prediction is examined at various conditions of scene correlation
and pairwise observations. Potentials of the proposed method are demonstrated
by comparing with conventional end-to-end methods and linear predictions.
Related papers
- AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - JointMotion: Joint Self-Supervision for Joint Motion Prediction [10.44846560021422]
JointMotion is a self-supervised pre-training method for joint motion prediction in self-driving vehicles.
Our method reduces the joint final displacement error of Wayformer, HPTR, and Scene Transformer models by 3%, 8%, and 12%, respectively.
arXiv Detail & Related papers (2024-03-08T17:54:38Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - Spatio-Temporal Relation Learning for Video Anomaly Detection [35.59510027883497]
Anomaly identification is highly dependent on the relationship between the object and the scene.
In this paper, we propose a Spatial-Temporal Relation Learning framework to tackle the video anomaly detection task.
Experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2022-09-27T02:19:31Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - End-to-end Contextual Perception and Prediction with Interaction
Transformer [79.14001602890417]
We tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving.
To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture.
Our model can be trained end-to-end, and runs in real-time.
arXiv Detail & Related papers (2020-08-13T14:30:12Z) - Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction [40.20696709103593]
This paper designs a new mechanism, textiti.e., Dynamic and Static Context-aware Motion Predictor (DSCMP)
It integrates rich information into the long-short-term-memory (LSTM)
It models the dynamic interactions between agents by learning both their spatial positions and temporal coherence.
It captures the context of scene by inferring latent variable, which enables multimodal predictions with meaningful semantic scene layout.
arXiv Detail & Related papers (2020-08-03T11:03:57Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z) - Deep Representation Learning and Clustering of Traffic Scenarios [0.0]
We introduce two data driven autoencoding models that learn latent representation of traffic scenes.
We show how the latent scenario embeddings can be used for clustering traffic scenarios and similarity retrieval.
arXiv Detail & Related papers (2020-07-15T15:12:23Z) - Forecast Network-Wide Traffic States for Multiple Steps Ahead: A Deep
Learning Approach Considering Dynamic Non-Local Spatial Correlation and
Non-Stationary Temporal Dependency [6.019104024723682]
This research studies two particular problems in traffic forecasting: (1) capture the dynamic and non-local spatial correlation between traffic links and (2) model the dynamics of temporal dependency for accurate multiple steps ahead predictions.
We propose a deep learning framework named Spatial-Temporal Sequence to Sequence model (STSeq2Seq) to address these issues.
This model builds on sequence to sequence (seq2seq) architecture to capture temporal feature and relies on graph convolution for aggregating spatial information.
arXiv Detail & Related papers (2020-04-06T03:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.