MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries
- URL: http://arxiv.org/abs/2205.00613v1
- Date: Mon, 2 May 2022 01:45:41 GMT
- Title: MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries
- Authors: Tianyuan Zhang, Xuanyao Chen, Yue Wang, Yilun Wang, Hang Zhao
- Abstract summary: 3D tracking from multiple cameras is a key component in a vision-based autonomous driving system.
We propose an end-to-end textbfMUlti-camera textbfTRacking framework called MUTR3D.
MUTR3D does not explicitly rely on the spatial and appearance similarity of objects.
It outperforms state-of-the-art methods by 5.3 AMOTA on the nuScenes dataset.
- Score: 18.70932813595532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and consistent 3D tracking from multiple cameras is a key component
in a vision-based autonomous driving system. It involves modeling 3D dynamic
objects in complex scenes across multiple cameras. This problem is inherently
challenging due to depth estimation, visual occlusions, appearance ambiguity,
etc. Moreover, objects are not consistently associated across time and cameras.
To address that, we propose an end-to-end \textbf{MU}lti-camera
\textbf{TR}acking framework called MUTR3D. In contrast to prior works, MUTR3D
does not explicitly rely on the spatial and appearance similarity of objects.
Instead, our method introduces \textit{3D track query} to model spatial and
appearance coherent track for each object that appears in multiple cameras and
multiple frames. We use camera transformations to link 3D trackers with their
observations in 2D images. Each tracker is further refined according to the
features that are obtained from camera images. MUTR3D uses a set-to-set loss to
measure the difference between the predicted tracking results and the ground
truths. Therefore, it does not require any post-processing such as non-maximum
suppression and/or bounding box association. MUTR3D outperforms
state-of-the-art methods by 5.3 AMOTA on the nuScenes dataset. Code is
available at: \url{https://github.com/a1600012888/MUTR3D}.
Related papers
- Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People
Tracking Benchmark [40.363608495563305]
We provide a large-scale densely-labeled multi-camera tracking dataset in five different environments with the help of an auto-annotation system.
The 3D tracking results are projected to each RGB camera view using camera parameters to create 2D tracking results.
This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.
arXiv Detail & Related papers (2021-11-30T06:29:14Z) - Tracking People with 3D Representations [78.97070307547283]
We present a novel approach for tracking multiple people in video.
Unlike past approaches which employ 2D representations, we employ 3D representations of people, located in three-dimensional space.
We find that 3D representations are more effective than 2D representations for tracking in these settings.
arXiv Detail & Related papers (2021-11-15T16:15:21Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - MonoCInIS: Camera Independent Monocular 3D Object Detection using
Instance Segmentation [55.96577490779591]
Methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
We show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of 'camera independence' in order to benefit from large and heterogeneous training data.
arXiv Detail & Related papers (2021-10-01T14:56:37Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [23.456046776979903]
We propose to leverage multiview data of textitstatic points in arbitrary scenes (static or dynamic) to learn a neural 3D mapping module.
The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output.
We show that our unsupervised 3D object trackers outperform prior unsupervised 2D and 2.5D trackers, and approach the accuracy of supervised trackers.
arXiv Detail & Related papers (2020-08-04T02:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.