BEV-SUSHI: Multi-Target Multi-Camera 3D Detection and Tracking in Bird's-Eye View
- URL: http://arxiv.org/abs/2412.00692v2
- Date: Sat, 07 Dec 2024 22:46:42 GMT
- Title: BEV-SUSHI: Multi-Target Multi-Camera 3D Detection and Tracking in Bird's-Eye View
- Authors: Yizhou Wang, Tim Meinhardt, Orcun Cetintas, Cheng-Yen Yang, Sameer Satish Pusegaonkar, Benjamin Missaoui, Sujit Biswas, Zheng Tang, Laura Leal-Taixé,
- Abstract summary: We propose a 3D object detection and tracking framework, named BEV-SUSHI.
It aggregates multi-view images with necessary camera calibration parameters to obtain 3D object detections in bird's-eye view.
Unlike existing methods, BEV-SUSHI has impressive generalizability across different scenes and diverse camera settings.
- Score: 39.75692195719607
- License:
- Abstract: Object perception from multi-view cameras is crucial for intelligent systems, particularly in indoor environments, e.g., warehouses, retail stores, and hospitals. Most traditional multi-target multi-camera (MTMC) detection and tracking methods rely on 2D object detection, single-view multi-object tracking (MOT), and cross-view re-identification (ReID) techniques, without properly handling important 3D information by multi-view image aggregation. In this paper, we propose a 3D object detection and tracking framework, named BEV-SUSHI, which first aggregates multi-view images with necessary camera calibration parameters to obtain 3D object detections in bird's-eye view (BEV). Then, we introduce hierarchical graph neural networks (GNNs) to track these 3D detections in BEV for MTMC tracking results. Unlike existing methods, BEV-SUSHI has impressive generalizability across different scenes and diverse camera settings, with exceptional capability for long-term association handling. As a result, our proposed BEV-SUSHI establishes the new state-of-the-art on the AICity'24 dataset with 81.22 HOTA, and 95.6 IDF1 on the WildTrack dataset.
Related papers
- SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras [3.648972014796591]
We present a single model termed SimPB, which simultaneously detects 2D objects in the perspective view and 3D objects in the BEV space from multiple cameras.
A hybrid decoder consists of several multi-view 2D decoder layers and several 3D decoder layers, specifically designed for their respective detection tasks.
arXiv Detail & Related papers (2024-03-15T14:39:39Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Unsupervised Multi-view Pedestrian Detection [12.882317991955228]
We propose an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping.
SIS is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels.
GVD encodes multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D mapping.
arXiv Detail & Related papers (2023-05-21T13:27:02Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - MVM3Det: A Novel Method for Multi-view Monocular 3D Detection [0.0]
MVM3Det simultaneously estimates the 3D position and orientation of the object according to the multi-view monocular information.
We present a first dataset for multi-view 3D object detection named MVM3D.
arXiv Detail & Related papers (2021-09-22T01:31:00Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View [117.44028458220427]
On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices.
We present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images.
arXiv Detail & Related papers (2020-03-09T15:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.