STrajNet: Occupancy Flow Prediction via Multi-modal Swin Transformer
- URL: http://arxiv.org/abs/2208.00394v1
- Date: Sun, 31 Jul 2022 08:36:55 GMT
- Title: STrajNet: Occupancy Flow Prediction via Multi-modal Swin Transformer
- Authors: Haochen Liu, Zhiyu Huang, Chen Lv
- Abstract summary: This work proposes STrajNet: a multi-modal Swin Transformerbased framework for effective scene occupancy and flow predictions.
We employ Swin Transformer to encode the image and interaction-aware motion representations and propose a cross-attention module to inject motion awareness into grid cells.
Flow and occupancy predictions are then decoded through temporalsharing Pyramid decoders.
- Score: 7.755385141347842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Making an accurate prediction of occupancy and flow is essential to enable
better safety and interaction for autonomous vehicles under complex traffic
scenarios. This work proposes STrajNet: a multi-modal Swin Transformerbased
framework for effective scene occupancy and flow predictions. We employ Swin
Transformer to encode the image and interaction-aware motion representations
and propose a cross-attention module to inject motion awareness into grid cells
across different time steps. Flow and occupancy predictions are then decoded
through temporalsharing Pyramid decoders. The proposed method shows competitive
prediction accuracy and other evaluation metrics in the Waymo Open Dataset
benchmark.
Related papers
- AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and
Guided Intention Querying [110.83590008788745]
Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions.
In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges.
The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries.
We introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents.
arXiv Detail & Related papers (2023-06-30T16:23:04Z) - Motion Transformer with Global Intention Localization and Local Movement
Refinement [103.75625476231401]
Motion TRansformer (MTR) models motion prediction as the joint optimization of global intention localization and local movement refinement.
MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges.
arXiv Detail & Related papers (2022-09-27T16:23:14Z) - VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow
Prediction [18.277777620073685]
We propose a novel occupancy flow fields predictor to produce accurate occupancy and flow predictions.
Our model ranks 3rd place on the Open dataset Occupancy and Flow Prediction Challenge, and achieves the best performance in the occluded occupancy and flow prediction task.
arXiv Detail & Related papers (2022-08-09T03:49:04Z) - Multimodal Motion Prediction with Stacked Transformers [35.9674180611893]
We propose a novel transformer framework for multimodal motion prediction, termed as mmTransformer.
A novel network architecture based on stacked transformers is designed to model the multimodality at feature level with a set of fixed independent proposals.
A region-based training strategy is then developed to induce the multimodality of the generated proposals.
arXiv Detail & Related papers (2021-03-22T07:25:54Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z) - AMENet: Attentive Maps Encoder Network for Trajectory Prediction [35.22312783822563]
Trajectory prediction is critical for applications of planning safe future movements.
We propose an end-to-end generative model named Attentive Maps Network (AMENet)
AMENet encodes the agent's motion and interaction information for accurate and realistic multi-path trajectory prediction.
arXiv Detail & Related papers (2020-06-15T10:00:07Z) - TPNet: Trajectory Proposal Network for Motion Prediction [81.28716372763128]
Trajectory Proposal Network (TPNet) is a novel two-stage motion prediction framework.
TPNet first generates a candidate set of future trajectories as hypothesis proposals, then makes the final predictions by classifying and refining the proposals.
Experiments on four large-scale trajectory prediction datasets, show that TPNet achieves the state-of-the-art results both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-04-26T00:01:49Z) - MCENET: Multi-Context Encoder Network for Homogeneous Agent Trajectory
Prediction in Mixed Traffic [35.22312783822563]
Trajectory prediction in urban mixedtraffic zones is critical for many intelligent transportation systems.
We propose an approach named Multi-Context Network (MCENET) that is trained by encoding both past and future scene context.
In inference time, we combine the past context and motion information of the target agent with samplings of the latent variables to predict multiple realistic trajectories.
arXiv Detail & Related papers (2020-02-14T11:04:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.