MacFormer: Map-Agent Coupled Transformer for Real-time and Robust
Trajectory Prediction
- URL: http://arxiv.org/abs/2308.10280v2
- Date: Thu, 31 Aug 2023 07:23:56 GMT
- Title: MacFormer: Map-Agent Coupled Transformer for Real-time and Robust
Trajectory Prediction
- Authors: Chen Feng, Hangning Zhou, Huadong Lin, Zhigang Zhang, Ziyao Xu, Chi
Zhang, Boyu Zhou, Shaojie Shen
- Abstract summary: We propose Map-Agent Coupled Transformer (MacFormer) for real-time and robust trajectory prediction.
Our framework explicitly incorporates map constraints into the network via two carefully designed modules named coupled map and reference extractor.
We evaluate our approach on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all achieved state-of-the-art performance.
- Score: 26.231420111336565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the future behavior of agents is a fundamental task in autonomous
vehicle domains. Accurate prediction relies on comprehending the surrounding
map, which significantly regularizes agent behaviors. However, existing methods
have limitations in exploiting the map and exhibit a strong dependence on
historical trajectories, which yield unsatisfactory prediction performance and
robustness. Additionally, their heavy network architectures impede real-time
applications. To tackle these problems, we propose Map-Agent Coupled
Transformer (MacFormer) for real-time and robust trajectory prediction. Our
framework explicitly incorporates map constraints into the network via two
carefully designed modules named coupled map and reference extractor. A novel
multi-task optimization strategy (MTOS) is presented to enhance learning of
topology and rule constraints. We also devise bilateral query scheme in context
fusion for a more efficient and lightweight network. We evaluated our approach
on Argoverse 1, Argoverse 2, and nuScenes real-world benchmarks, where it all
achieved state-of-the-art performance with the lowest inference latency and
smallest model size. Experiments also demonstrate that our framework is
resilient to imperfect tracklet inputs. Furthermore, we show that by combining
with our proposed strategies, classical models outperform their baselines,
further validating the versatility of our framework.
Related papers
- Self-Supervised State Space Model for Real-Time Traffic Accident Prediction Using eKAN Networks [18.385759762991896]
SSL-eKamba is an efficient self-supervised framework for traffic accident prediction.
To enhance generalization, we design two self-supervised auxiliary tasks that adaptively improve traffic pattern representation.
Experiments on two real-world datasets demonstrate that SSL-eKamba consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-09-09T14:25:51Z) - StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction [22.29257945966914]
We propose a streaming and unified framework for joint 3D Multi-Object Tracking and trajectory Prediction (StreamMOTP)
We construct the model in a streaming manner and exploit a memory bank to preserve and leverage the long-term latent features for tracked objects more effectively.
We also improve the quality and consistency of predicted trajectories with a dual-stream predictor.
arXiv Detail & Related papers (2024-06-28T11:35:35Z) - DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - A Fast and Map-Free Model for Trajectory Prediction in Traffics [2.435517936694533]
This paper proposes an efficient trajectory prediction model that is not dependent on traffic maps.
By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer, our model is able to learn rich dynamic and interaction information of all agents.
Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset.
arXiv Detail & Related papers (2023-07-19T08:36:31Z) - GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization.
We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph.
Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z) - Trajectory Prediction with Graph-based Dual-scale Context Fusion [43.51107329748957]
We present a graph-based trajectory prediction network named the Dual Scale Predictor.
It encodes both the static and dynamical driving context in a hierarchical manner.
Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories.
arXiv Detail & Related papers (2021-11-02T13:42:16Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - Exploiting latent representation of sparse semantic layers for improved
short-term motion prediction with Capsule Networks [0.12183405753834559]
This paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map.
By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation.
We show that our model achieves significant improvement over recently published works on prediction, whilst drastically reducing the overall size of the network.
arXiv Detail & Related papers (2021-03-02T11:13:43Z) - Multi-Agent Routing Value Iteration Network [88.38796921838203]
We propose a graph neural network based model that is able to perform multi-agent routing based on learned value in a sparsely connected graph.
We show that our model trained with only two agents on graphs with a maximum of 25 nodes can easily generalize to situations with more agents and/or nodes.
arXiv Detail & Related papers (2020-07-09T22:16:45Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.