BEVTraj: Map-Free End-to-End Trajectory Prediction in Bird's-Eye View with Deformable Attention and Sparse Goal Proposals
- URL: http://arxiv.org/abs/2509.10080v1
- Date: Fri, 12 Sep 2025 09:17:54 GMT
- Title: BEVTraj: Map-Free End-to-End Trajectory Prediction in Bird's-Eye View with Deformable Attention and Sparse Goal Proposals
- Authors: Minsang Kong, Myeongjun Kim, Sang Gu Kang, Sang Hun Lee,
- Abstract summary: We propose Bird's-Eye View Trajectory Prediction (BEVTraj) for autonomous driving.<n>It operates directly in the bird's-eye view (BEV) space utilizing real-time sensor data without relying on pre-built maps.<n>It achieves performance comparable to state-of-the-art HD map-based models while offering greater flexibility.
- Score: 0.8166364251367625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In autonomous driving, trajectory prediction is essential for ensuring safe and efficient navigation. To improve prediction accuracy, recent approaches often rely on pre-built high-definition (HD) maps or real-time local map construction modules to incorporate static environmental information. However, pre-built HD maps are limited to specific regions and cannot adapt to transient changes. In addition, local map construction modules, which recognize only predefined elements, may fail to capture critical scene details or introduce errors that degrade prediction performance. To overcome these limitations, we propose Bird's-Eye View Trajectory Prediction (BEVTraj), a novel trajectory prediction framework that operates directly in the bird's-eye view (BEV) space utilizing real-time sensor data without relying on any pre-built maps. The BEVTraj leverages deformable attention to efficiently extract relevant context from dense BEV features. Furthermore, we introduce a Sparse Goal Candidate Proposal (SGCP) module, which enables full end-to-end prediction without requiring any post-processing steps. Extensive experiments demonstrate that the BEVTraj achieves performance comparable to state-of-the-art HD map-based models while offering greater flexibility by eliminating the dependency on pre-built maps. The source code is available at https://github.com/Kongminsang/bevtraj.
Related papers
- Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction [17.16231247910372]
UIGenMap is an uncertainty-instructed structure injection approach for generalizable HD map vectorization.<n>We introduce the perspective-view (PV) detection branch to obtain explicit structural features.<n>Experiments on challenging geographically disjoint (geo-based) data demonstrate that our UIGenMap achieves superior performance.
arXiv Detail & Related papers (2025-03-29T15:01:38Z) - Unified Human Localization and Trajectory Prediction with Monocular Vision [64.19384064365431]
MonoTransmotion is a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks.<n>We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs.
arXiv Detail & Related papers (2025-03-05T14:18:39Z) - TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps)
We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information.
Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z) - Map-Free Trajectory Prediction with Map Distillation and Hierarchical Encoding [8.857237929151795]
MFTP is a Map-Free Trajectory Prediction method that offers several advantages.
First, it eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation.
Second, we present a novel hierarchical encoder that effectively extracts spatial-temporal agent features and aggregates them into multiple trajectory queries.
arXiv Detail & Related papers (2024-11-17T04:50:44Z) - CASPFormer: Trajectory Prediction from BEV Images with Deformable
Attention [4.9349065371630045]
We propose Context Aware Scene Prediction Transformer (CASPFormer), which can perform multi-modal motion prediction from spatialized Bird-Eye-View (BEV) images.
Our system can be integrated with any upstream perception module that is capable of generating BEV images.
We evaluate our model on the nuScenes dataset and show that it reaches state-of-the-art across multiple metrics.
arXiv Detail & Related papers (2024-09-26T12:37:22Z) - PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction [9.32290307534907]
PrevPredMap is a pioneering temporal modeling framework that leverages previous predictions for constructing online vectorized HD maps.
The framework achieves state-of-the-art performance on the nuScenes and Argoverse2 datasets.
arXiv Detail & Related papers (2024-07-24T15:58:24Z) - Augmenting Lane Perception and Topology Understanding with Standard
Definition Navigation Maps [51.24861159115138]
Standard Definition (SD) maps are more affordable and have worldwide coverage, offering a scalable alternative.
We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Representations from transFormers.
This enhancement consistently and significantly boosts (by up to 60%) lane detection and topology prediction on current state-of-the-art online map prediction methods.
arXiv Detail & Related papers (2023-11-07T15:42:22Z) - MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models [24.681557413829317]
MapPrior is a novel BEV perception framework that combines a traditional BEV perception model with a learned generative model for semantic map layouts.
At the time of submission, MapPrior outperforms the strongest competing method, with significantly improved MMD and ECE scores in camera- and LiDAR-based BEV perception.
arXiv Detail & Related papers (2023-08-24T17:58:30Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - LOPR: Latent Occupancy PRediction using Generative Models [49.15687400958916]
LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation.
We propose a framework that decouples occupancy prediction into: representation learning and prediction within the learned latent space.
arXiv Detail & Related papers (2022-10-03T22:04:00Z) - SLPC: a VRNN-based approach for stochastic lidar prediction and
completion in autonomous driving [63.87272273293804]
We propose a new LiDAR prediction framework that is based on generative models namely Variational Recurrent Neural Networks (VRNNs)
Our algorithm is able to address the limitations of previous video prediction frameworks when dealing with sparse data by spatially inpainting the depth maps in the upcoming frames.
We present a sparse version of VRNNs and an effective self-supervised training method that does not require any labels.
arXiv Detail & Related papers (2021-02-19T11:56:44Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.