XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for
Autonomous Driving
- URL: http://arxiv.org/abs/2308.08764v1
- Date: Thu, 17 Aug 2023 03:35:13 GMT
- Title: XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for
Autonomous Driving
- Authors: Zijian Song, Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhaoqi Wang
- Abstract summary: Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving.
We present a cross-view trajectory prediction method using shared 3D queries (XVTP3D)
The results of experiments on two publicly available datasets show that XVTP3D achieved state-of-the-art performance with consistent cross-view predictions.
- Score: 7.616422495497465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Trajectory prediction with uncertainty is a critical and challenging task for
autonomous driving. Nowadays, we can easily access sensor data represented in
multiple views. However, cross-view consistency has not been evaluated by the
existing models, which might lead to divergences between the multimodal
predictions from different views. It is not practical and effective when the
network does not comprehend the 3D scene, which could cause the downstream
module in a dilemma. Instead, we predicts multimodal trajectories while
maintaining cross-view consistency. We presented a cross-view trajectory
prediction method using shared 3D Queries (XVTP3D). We employ a set of 3D
queries shared across views to generate multi-goals that are cross-view
consistent. We also proposed a random mask method and coarse-to-fine
cross-attention to capture robust cross-view features. As far as we know, this
is the first work that introduces the outstanding top-down paradigm in BEV
detection field to a trajectory prediction problem. The results of experiments
on two publicly available datasets show that XVTP3D achieved state-of-the-art
performance with consistent cross-view predictions.
Related papers
- PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion [80.79938369319152]
We design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF)
Our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the Messy Room dataset (4.4% improvement of scene-level PQ)
arXiv Detail & Related papers (2024-10-14T16:06:59Z) - Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving [4.628774934971078]
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle.
We introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models.
Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU.
arXiv Detail & Related papers (2024-08-01T08:32:03Z) - StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction [22.29257945966914]
We propose a streaming and unified framework for joint 3D Multi-Object Tracking and trajectory Prediction (StreamMOTP)
We construct the model in a streaming manner and exploit a memory bank to preserve and leverage the long-term latent features for tracked objects more effectively.
We also improve the quality and consistency of predicted trajectories with a dual-stream predictor.
arXiv Detail & Related papers (2024-06-28T11:35:35Z) - Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation
Learning of Vision-based Autonomous Driving [73.3702076688159]
We propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence.
We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks.
arXiv Detail & Related papers (2024-02-23T19:43:01Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D
Pose Estimation Tracking and Forecasting on a Video Snippet [24.852728097115744]
Multi-person pose understanding from RGB involves three complex tasks: pose estimation, tracking and motion forecasting.
Most existing works either focus on a single task or employ multi-stage approaches to solving multiple tasks separately.
We propose Snipper, a unified framework to perform multi-person 3D pose estimation, tracking, and motion forecasting simultaneously in a single stage.
arXiv Detail & Related papers (2022-07-09T18:42:14Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - X-view: Non-egocentric Multi-View 3D Object Detector [40.25127812839952]
We propose a novel multi-view-based 3D detection method, named X-view, to overcome the drawbacks of the multi-view methods.
X-view breaks through the traditional limitation about the perspective view whose original point must be consistent with the 3D Cartesian coordinate.
We conduct experiments on KITTI and NuScenes datasets to demonstrate the robustness and effectiveness of our proposed X-view.
arXiv Detail & Related papers (2021-03-24T06:13:35Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.