Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction
- URL: http://arxiv.org/abs/2008.00777v1
- Date: Mon, 3 Aug 2020 11:03:57 GMT
- Title: Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction
- Authors: Chaofan Tao, Qinhong Jiang, Lixin Duan, Ping Luo
- Abstract summary: This paper designs a new mechanism, textiti.e., Dynamic and Static Context-aware Motion Predictor (DSCMP)
It integrates rich information into the long-short-term-memory (LSTM)
It models the dynamic interactions between agents by learning both their spatial positions and temporal coherence.
It captures the context of scene by inferring latent variable, which enables multimodal predictions with meaningful semantic scene layout.
- Score: 40.20696709103593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent motion prediction is challenging because it aims to foresee the
future trajectories of multiple agents (\textit{e.g.} pedestrians)
simultaneously in a complicated scene. Existing work addressed this challenge
by either learning social spatial interactions represented by the positions of
a group of pedestrians, while ignoring their temporal coherence (\textit{i.e.}
dependencies between different long trajectories), or by understanding the
complicated scene layout (\textit{e.g.} scene segmentation) to ensure safe
navigation. However, unlike previous work that isolated the spatial
interaction, temporal coherence, and scene layout, this paper designs a new
mechanism, \textit{i.e.}, Dynamic and Static Context-aware Motion Predictor
(DSCMP), to integrates these rich information into the long-short-term-memory
(LSTM). It has three appealing benefits. (1) DSCMP models the dynamic
interactions between agents by learning both their spatial positions and
temporal coherence, as well as understanding the contextual scene layout.(2)
Different from previous LSTM models that predict motions by propagating hidden
features frame by frame, limiting the capacity to learn correlations between
long trajectories, we carefully design a differentiable queue mechanism in
DSCMP, which is able to explicitly memorize and learn the correlations between
long trajectories. (3) DSCMP captures the context of scene by inferring latent
variable, which enables multimodal predictions with meaningful semantic scene
layout. Extensive experiments show that DSCMP outperforms state-of-the-art
methods by large margins, such as 9.05\% and 7.62\% relative improvements on
the ETH-UCY and SDD datasets respectively.
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features.
The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z) - Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph
Generation [64.85974098314344]
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video.
Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images.
We propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism.
arXiv Detail & Related papers (2023-09-23T02:40:28Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Multiple Object Tracking with Correlation Learning [16.959379957515974]
We propose to exploit the local correlation module to model the topological relationship between targets and their surrounding environment.
Specifically, we establish dense correspondences of each spatial location and its context, and explicitly constrain the correlation volumes through self-supervised learning.
Our approach demonstrates the effectiveness of correlation learning with the superior performance and obtains state-of-the-art MOTA of 76.5% and IDF1 of 73.6% on MOT17.
arXiv Detail & Related papers (2021-04-08T06:48:02Z) - Exploring Dynamic Context for Multi-path Trajectory Prediction [33.66335553588001]
We propose a novel framework, named Dynamic Context Network (DCENet)
In our framework, the spatial context between agents is explored by using self-attention architectures.
A set of future trajectories for each agent is predicted conditioned on the learned spatial-temporal context.
arXiv Detail & Related papers (2020-10-30T13:39:20Z) - Graph2Kernel Grid-LSTM: A Multi-Cued Model for Pedestrian Trajectory
Prediction by Learning Adaptive Neighborhoods [10.57164270098353]
We present a new perspective to interaction modeling by proposing that pedestrian neighborhoods can become adaptive in design.
Our model outperforms state-of-the-art approaches that collate resembling features over several publicly-tested surveillance videos.
arXiv Detail & Related papers (2020-07-03T19:05:48Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.