Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking
- URL: http://arxiv.org/abs/2208.10056v2
- Date: Tue, 23 Aug 2022 07:22:56 GMT
- Title: Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking
- Authors: JunYoung Gwak, Silvio Savarese, Jeannette Bohg
- Abstract summary: We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems.
Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN.
We show in large-scale experiments that the overall performance gain of our method is due to four factors.
- Score: 53.64390261936975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research in multi-task learning reveals the benefit of solving related
problems in a single neural network. 3D object detection and multi-object
tracking (MOT) are two heavily intertwined problems predicting and associating
an object instance location across time. However, most previous works in 3D MOT
treat the detector as a preceding separated pipeline, disjointly taking the
output of the detector as an input to the tracker. In this work, we present
Minkowski Tracker, a sparse spatio-temporal R-CNN that jointly solves object
detection and tracking. Inspired by region-based CNN (R-CNN), we propose to
solve tracking as a second stage of the object detector R-CNN that predicts
assignment probability to tracks. First, Minkowski Tracker takes 4D point
clouds as input to generate a spatio-temporal Bird's-eye-view (BEV) feature map
through a 4D sparse convolutional encoder network. Then, our proposed
TrackAlign aggregates the track region-of-interest (ROI) features from the BEV
features. Finally, Minkowski Tracker updates the track and its confidence score
based on the detection-to-track match probability predicted from the ROI
features. We show in large-scale experiments that the overall performance gain
of our method is due to four factors: 1. The temporal reasoning of the 4D
encoder improves the detection performance 2. The multi-task learning of object
detection and MOT jointly enhances each other 3. The detection-to-track match
score learns implicit motion model to enhance track assignment 4. The
detection-to-track match score improves the quality of the track confidence
score. As a result, Minkowski Tracker achieved the state-of-the-art performance
on Nuscenes dataset tracking task without hand-designed motion models.
Related papers
- ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames.
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes.
In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - Joint 3D Object Detection and Tracking Using Spatio-Temporal
Representation of Camera Image and LiDAR Point Clouds [12.334725127696395]
We propose a new joint object detection and tracking (DT) framework for 3D object detection and tracking based on camera and LiDAR sensors.
The proposed method, referred to as 3D DetecJo, enables the detector and tracker to cooperate to generate atemporal-representation of the camera and LiDAR data.
arXiv Detail & Related papers (2021-12-14T02:38:45Z) - 3D-FCT: Simultaneous 3D Object Detection and Tracking Using Feature
Correlation [0.0]
3D-FCT is a Siamese network architecture that utilizes temporal information to simultaneously perform the related tasks of 3D object detection and tracking.
Our proposed method is evaluated on the KITTI tracking dataset where it is shown to provide an improvement of 5.57% mAP over a state-of-the-art approach.
arXiv Detail & Related papers (2021-10-06T06:36:29Z) - Exploring Simple 3D Multi-Object Tracking for Autonomous Driving [10.921208239968827]
3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles.
Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require a matching step for the detection association.
We present SimTrack to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection and tracking from raw point clouds.
arXiv Detail & Related papers (2021-08-23T17:59:22Z) - On the detection-to-track association for online multi-object tracking [30.883165972525347]
We propose a hybrid track association algorithm that models the historical appearance distances of a track with an incremental Gaussian mixture model (IGMM)
Experimental results on three MOT benchmarks confirm that HTA effectively improves the target identification performance with a small compromise to the tracking speed.
arXiv Detail & Related papers (2021-07-01T14:44:12Z) - Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end.
TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking [80.02322563402758]
One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets.
We introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion.
arXiv Detail & Related papers (2020-04-16T06:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.