End-to-End 3D Multi-Object Tracking and Trajectory Forecasting
- URL: http://arxiv.org/abs/2008.11598v1
- Date: Tue, 25 Aug 2020 16:54:46 GMT
- Title: End-to-End 3D Multi-Object Tracking and Trajectory Forecasting
- Authors: Xinshuo Weng, Ye Yuan, Kris Kitani
- Abstract summary: We propose a unified solution for 3D MOT and trajectory forecasting.
We employ a feature interaction technique by introducing Graph Neural Networks.
We also use a diversity sampling function to improve the quality and diversity of our forecasted trajectories.
- Score: 34.68114553744956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D multi-object tracking (MOT) and trajectory forecasting are two critical
components in modern 3D perception systems. We hypothesize that it is
beneficial to unify both tasks under one framework to learn a shared feature
representation of agent interaction. To evaluate this hypothesis, we propose a
unified solution for 3D MOT and trajectory forecasting which also incorporates
two additional novel computational units. First, we employ a feature
interaction technique by introducing Graph Neural Networks (GNNs) to capture
the way in which multiple agents interact with one another. The GNN is able to
model complex hierarchical interactions, improve the discriminative feature
learning for MOT association, and provide socially-aware context for trajectory
forecasting. Second, we use a diversity sampling function to improve the
quality and diversity of our forecasted trajectories. The learned sampling
function is trained to efficiently extract a variety of outcomes from a
generative trajectory distribution and helps avoid the problem of generating
many duplicate trajectory samples. We show that our method achieves
state-of-the-art performance on the KITTI dataset. Our project website is at
http://www.xinshuoweng.com/projects/GNNTrkForecast.
Related papers
- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction [22.29257945966914]
We propose a streaming and unified framework for joint 3D Multi-Object Tracking and trajectory Prediction (StreamMOTP)
We construct the model in a streaming manner and exploit a memory bank to preserve and leverage the long-term latent features for tracked objects more effectively.
We also improve the quality and consistency of predicted trajectories with a dual-stream predictor.
arXiv Detail & Related papers (2024-06-28T11:35:35Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Simultaneous Multiple Object Detection and Pose Estimation using 3D
Model Infusion with Monocular Vision [21.710141497071373]
Multiple object detection and pose estimation are vital computer vision tasks.
We propose simultaneous neural modeling of both using monocular vision and 3D model infusion.
Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network.
arXiv Detail & Related papers (2022-11-21T05:18:56Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Pareto-Optimal Bit Allocation for Collaborative Intelligence [39.11380888887304]
Collaborative intelligence (CI) has emerged as a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile/edge devices.
In this paper, we study bit allocation for feature coding in multi-stream CI systems.
arXiv Detail & Related papers (2020-09-25T20:48:33Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Graph Neural Networks for 3D Multi-Object Tracking [28.121708602059048]
3D Multi-object tracking (MOT) is crucial to autonomous systems.
Recent work often uses a tracking-by-detection pipeline.
We propose a novel feature interaction mechanism by introducing Graph Neural Networks.
arXiv Detail & Related papers (2020-08-20T17:55:41Z) - PTP: Parallelized Tracking and Prediction with Graph Neural Networks and
Diversity Sampling [34.68114553744956]
Multi-object tracking (MOT) and trajectory prediction are two critical components in modern 3D perception systems.
We propose a parallelized framework to learn a shared feature representation of agent interaction.
Our method with socially-aware feature learning and diversity sampling achieves new state-of-the-art performance on 3D MOT and trajectory prediction.
arXiv Detail & Related papers (2020-03-17T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.