MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR
- URL: http://arxiv.org/abs/2312.02409v2
- Date: Mon, 5 Feb 2024 19:56:54 GMT
- Title: MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR
- Authors: Yiqian Gan, Hao Xiao, Yizhe Zhao, Ethan Zhang, Zhe Huang, Xin Ye,
Lingting Ge
- Abstract summary: We propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents.
We evaluate MGTR on Open dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard.
- Score: 7.135065870025928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion prediction has been an essential component of autonomous driving
systems since it handles highly uncertain and complex scenarios involving
moving agents of different types. In this paper, we propose a Multi-Granular
TRansformer (MGTR) framework, an encoder-decoder network that exploits context
features in different granularities for different kinds of traffic agents. To
further enhance MGTR's capabilities, we leverage LiDAR point cloud data by
incorporating LiDAR semantic features from an off-the-shelf LiDAR feature
extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark
and show that the proposed method achieved state-of-the-art performance,
ranking 1st on its leaderboard
(https://waymo.com/open/challenges/2023/motion-prediction/).
Related papers
- SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds [13.097858142421519]
We propose a framework to accommodate various types of LiDAR prevalent in the market by replacing window-attention with sparse focal point modulation.
Our SFPNet is capable of extracting multi-level contexts and dynamically aggregating them using a gate mechanism.
We also introduce a novel large-scale hybrid-solid LiDAR semantic segmentation dataset for robotic applications.
arXiv Detail & Related papers (2024-07-16T10:22:09Z) - iMotion-LLM: Motion Prediction Instruction Tuning [33.63656257401926]
We introduce iMotion-LLM: a Multimodal Large Language Models with trajectory prediction, tailored to guide interactive multi-agent scenarios.
iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories.
These findings act as milestones in empowering autonomous navigation systems to interpret and predict the dynamics of multi-agent environments.
arXiv Detail & Related papers (2024-06-10T12:22:06Z) - MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and
Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions.
In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges.
The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries.
We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting [38.95768804529958]
We augment the Open Motion dataset with large-scale, high-quality, diverse LiDAR data for the motion forecasting task.
The new augmented dataset WOMD-LiDAR consists of over 100,000 scenes that each spans 20 seconds, consisting of well-synchronized and high quality LiDAR point clouds captured across a range of urban and suburban geographies.
Experiments show that the LiDAR data brings improvement in the motion forecasting task.
arXiv Detail & Related papers (2023-04-07T20:23:15Z) - Traj-MAE: Masked Autoencoders for Trajectory Prediction [69.7885837428344]
Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers.
We propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environment.
Our experimental results in both multi-agent and single-agent settings demonstrate that Traj-MAE achieves competitive results with state-of-the-art methods.
arXiv Detail & Related papers (2023-03-12T16:23:27Z) - Motion Transformer with Global Intention Localization and Local Movement
Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement.
MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z) - A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition.
We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z) - LiDAR-based Panoptic Segmentation via Dynamic Shifting Network [56.71765153629892]
LiDAR-based panoptic segmentation aims to parse both objects and scenes in a unified manner.
We propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm.
Our proposed DS-Net achieves superior accuracies over current state-of-the-art methods.
arXiv Detail & Related papers (2020-11-24T08:44:46Z) - Shared Cross-Modal Trajectory Prediction for Autonomous Driving [24.07872495811019]
We propose a Cross-Modal Embedding framework that aims to benefit from the use of multiple input modalities.
An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.
arXiv Detail & Related papers (2020-11-15T07:18:50Z) - Shared Cross-Modal Trajectory Prediction for Autonomous Driving [24.07872495811019]
We propose a Cross-Modal Embedding framework that aims to benefit from the use of multiple input modalities.
An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.
arXiv Detail & Related papers (2020-04-01T02:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.