Joint Detection and Tracking in Videos with Identification Features
- URL: http://arxiv.org/abs/2005.10905v2
- Date: Mon, 25 May 2020 11:42:49 GMT
- Title: Joint Detection and Tracking in Videos with Identification Features
- Authors: Bharti Munjal, Abdul Rafey Aftab, Sikandar Amin, Meltem D.
Brandlmaier, Federico Tombari, Fabio Galasso
- Abstract summary: We propose the first joint optimization of detection, tracking and re-identification features for videos.
Our method reaches the state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge among online trackers, and 3rd overall.
- Score: 36.55599286568541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have shown that combining object detection and tracking tasks,
in the case of video data, results in higher performance for both tasks, but
they require a high frame-rate as a strict requirement for performance. This is
assumption is often violated in real-world applications, when models run on
embedded devices, often at only a few frames per second.
Videos at low frame-rate suffer from large object displacements. Here
re-identification features may support to match large-displaced object
detections, but current joint detection and re-identification formulations
degrade the detector performance, as these two are contrasting tasks. In the
real-world application having separate detector and re-id models is often not
feasible, as both the memory and runtime effectively double.
Towards robust long-term tracking applicable to reduced-computational-power
devices, we propose the first joint optimization of detection, tracking and
re-identification features for videos. Notably, our joint optimization
maintains the detector performance, a typical multi-task challenge. At
inference time, we leverage detections for tracking (tracking-by-detection)
when the objects are visible, detectable and slowly moving in the image. We
leverage instead re-identification features to match objects which disappeared
(e.g. due to occlusion) for several frames or were not tracked due to fast
motion (or low-frame-rate videos). Our proposed method reaches the
state-of-the-art on MOT, it ranks 1st in the UA-DETRAC'18 tracking challenge
among online trackers, and 3rd overall.
Related papers
- Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames.
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes.
In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z) - Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Finding a Needle in a Haystack: Tiny Flying Object Detection in 4K
Videos using a Joint Detection-and-Tracking Approach [19.59528430884104]
We present a neural network model called the Recurrent Correlational Network, where detection and tracking are jointly performed.
In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements.
Our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on a bird image dataset.
arXiv Detail & Related papers (2021-05-18T03:22:03Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z) - DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT.
Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network.
DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z) - Detecting Invisible People [58.49425715635312]
We re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects.
We demonstrate that current detection and tracking systems perform dramatically worse on this task.
Second, we build dynamic models that explicitly reason in 3D, making use of observations produced by state-of-the-art monocular depth estimation networks.
arXiv Detail & Related papers (2020-12-15T16:54:45Z) - Robust and efficient post-processing for video object detection [9.669942356088377]
This work introduces a novel post-processing pipeline that overcomes some of the limitations of previous post-processing methods.
Our method improves the results of state-of-the-art specific video detectors, specially regarding fast moving objects.
And applied to efficient still image detectors, such as YOLO, provides comparable results to much more computationally intensive detectors.
arXiv Detail & Related papers (2020-09-23T10:47:24Z) - IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency [40.354708148590696]
"instance-aware MOT" (IA-MOT) can track multiple objects in either static or moving cameras.
Our proposed method won the first place in Track 3 of the BMTT Challenge in CVPR 2020 workshops.
arXiv Detail & Related papers (2020-06-24T03:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.