FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
- URL: http://arxiv.org/abs/2404.12867v2
- Date: Wed, 24 Jul 2024 10:33:11 GMT
- Title: FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
- Authors: Xingtai Gui, Tengteng Huang, Haonan Shao, Haotian Yao, Chi Zhang,
- Abstract summary: The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving.
We propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR)
In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR)
- Score: 8.370230253558159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will lead to degradation of the prediction performance. In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR), which views the task as BEV instance segmentation and prediction for future frames. We propose to adopt instance queries representing specific traffic participants to directly estimate the corresponding future occupied masks, and thus get rid of complex post-processing procedures. Besides, we devise a flow-aware BEV predictor for future BEV feature prediction composed of a flow-aware deformable attention that takes backward flow guiding the offset sampling. A novel future instance matching strategy is also proposed to further improve the temporal coherence. Extensive experiments demonstrate the superiority of FipTR and its effectiveness under different temporal BEV encoders. The code is available at https://github.com/TabGuigui/FipTR .
Related papers
- AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - A Novel Deep Neural Network for Trajectory Prediction in Automated
Vehicles Using Velocity Vector Field [12.067838086415833]
This paper proposes a novel technique for trajectory prediction that combines a data-driven learning-based method with a velocity vector field (VVF) generated from a nature-inspired concept.
The accuracy remains consistent with decreasing observation windows which alleviates the requirement of a long history of past observations for accurate trajectory prediction.
arXiv Detail & Related papers (2023-09-19T22:14:52Z) - PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction
in Bird's-Eye View [14.113805629254191]
Bird's-eye view (BEV) representations are commonplace in perception for autonomous driving.
Existing approaches for BEV instance prediction rely on a multi-task auto-regressive coupled with post-processing to predict future instances.
We propose an efficient novel end-to-end framework named POWERBEV, which differs in several design choices aimed at reducing the inherent redundancy in previous methods.
arXiv Detail & Related papers (2023-06-19T08:11:05Z) - StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation [15.441175735210791]
StreamingFlow is a novel BEV occupancy predictor that ingests asynchronous multi-sensor data streams for fusion.
It learns derivatives of BEV features over temporal horizons, updates the implicit sensor's BEV features as part of the fusion process, and propagates BEV states to the desired future time point.
It significantly outperforms previous vision-based, LiDAR-based methods, and shows superior performance compared to state-of-the-art fusion-based methods.
arXiv Detail & Related papers (2023-02-19T14:38:01Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - Temporally-Continuous Probabilistic Prediction using Polynomial
Trajectory Parameterization [12.896275507449936]
A commonly-used representation for motion prediction of actors is a sequence of waypoints for each actor at discrete future time-points.
This approach is simple and flexible, but it can exhibit unrealistic higher-order derivatives and approximation errors at intermediate time steps.
We propose a simple and general representation for temporally continuous trajectory prediction that is based on trajectory parameterization.
arXiv Detail & Related papers (2020-11-01T01:51:44Z) - Video Prediction via Example Guidance [156.08546987158616]
In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.
In this work, we propose a simple yet effective framework that can efficiently predict plausible future states.
arXiv Detail & Related papers (2020-07-03T14:57:24Z) - AutoCP: Automated Pipelines for Accurate Prediction Intervals [84.16181066107984]
This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP)
Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate.
We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
arXiv Detail & Related papers (2020-06-24T23:13:11Z) - TPNet: Trajectory Proposal Network for Motion Prediction [81.28716372763128]
Trajectory Proposal Network (TPNet) is a novel two-stage motion prediction framework.
TPNet first generates a candidate set of future trajectories as hypothesis proposals, then makes the final predictions by classifying and refining the proposals.
Experiments on four large-scale trajectory prediction datasets, show that TPNet achieves the state-of-the-art results both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-04-26T00:01:49Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.