TDIOT: Target-driven Inference for Deep Video Object Tracking
- URL: http://arxiv.org/abs/2103.11017v2
- Date: Tue, 23 Mar 2021 08:51:19 GMT
- Title: TDIOT: Target-driven Inference for Deep Video Object Tracking
- Authors: Filiz Gurkan, Llukman Cerkezi, Ozgun Cirakman, Bilge Gunsel
- Abstract summary: In this work, we adopt the pre-trained Mask R-CNN deep object detector as the baseline.
We introduce a novel inference architecture placed on top of FPN-ResNet101 backbone of Mask R-CNN to jointly perform detection and tracking.
The proposed single object tracker, TDIOT, applies an appearance similarity-based temporal matching for data association.
- Score: 0.2457872341625575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent tracking-by-detection approaches use deep object detectors as target
detection baseline, because of their high performance on still images. For
effective video object tracking, object detection is integrated with a data
association step performed by either a custom design inference architecture or
an end-to-end joint training for tracking purpose. In this work, we adopt the
former approach and use the pre-trained Mask R-CNN deep object detector as the
baseline. We introduce a novel inference architecture placed on top of
FPN-ResNet101 backbone of Mask R-CNN to jointly perform detection and tracking,
without requiring additional training for tracking purpose. The proposed single
object tracker, TDIOT, applies an appearance similarity-based temporal matching
for data association. In order to tackle tracking discontinuities, we
incorporate a local search and matching module into the inference head layer
that exploits SiamFC for short term tracking. Moreover, in order to improve
robustness to scale changes, we introduce a scale adaptive region proposal
network that enables to search the target at an adaptively enlarged spatial
neighborhood specified by the trace of the target. In order to meet long term
tracking requirements, a low cost verification layer is incorporated into the
inference architecture to monitor presence of the target based on its LBP
histogram model. Performance evaluation on videos from VOT2016, VOT2018 and
VOT-LT2018 datasets demonstrate that TDIOT achieves higher accuracy compared to
the state-of-the-art short-term trackers while it provides comparable
performance in long term tracking.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - RTracker: Recoverable Tracking via PN Tree Structured Memory [71.05904715104411]
We propose a recoverable tracking framework, RTracker, that uses a tree-structured memory to dynamically associate a tracker and a detector to enable self-recovery.
Specifically, we propose a Positive-Negative Tree-structured memory to chronologically store and maintain positive and negative target samples.
Our core idea is to use the support samples of positive and negative target categories to establish a relative distance-based criterion for a reliable assessment of target loss.
arXiv Detail & Related papers (2024-03-28T08:54:40Z) - SeMoLi: What Moves Together Belongs Together [51.72754014130369]
We tackle semi-supervised object detection based on motion cues.
Recent results suggest that motion-based clustering methods can be used to pseudo-label instances of moving objects.
We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner.
arXiv Detail & Related papers (2024-02-29T18:54:53Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Multi-Object Tracking by Iteratively Associating Detections with Uniform
Appearance for Trawl-Based Fishing Bycatch Monitoring [22.228127377617028]
The aim of in-trawl catch monitoring for use in fishing operations is to detect, track and classify fish targets in real-time from video footage.
We propose a novel MOT method, built upon an existing observation-centric tracking algorithm, by adopting a new iterative association step.
Our method offers improved performance in tracking targets with uniform appearance and outperforms state-of-the-art techniques on our underwater fish datasets as well as the MOT17 dataset.
arXiv Detail & Related papers (2023-04-10T18:55:10Z) - Target-Aware Tracking with Long-term Context Attention [8.20858704675519]
Long-term context attention (LCA) module can perform extensive information fusion on the target and its context from long-term frames.
LCA uses the target state from the previous frame to exclude the interference of similar objects and complex backgrounds.
Our tracker achieves state-of-the-art performance on multiple benchmarks, with 71.1% AUC, 89.3% NP, and 73.0% AO on LaSOT, TrackingNet, and GOT-10k.
arXiv Detail & Related papers (2023-02-27T14:40:58Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset [5.962184741057505]
We introduce SOMPT22 dataset; a new set for multi person tracking with annotated short videos captured from static cameras located on poles with 6-8 meters in height positioned for city surveillance.
We analyze MOT trackers classified as one-shot and two-stage with respect to the way of use of detection and reID networks on this new dataset.
The experimental results of our new dataset indicate that SOTA is still far from high efficiency, and single-shot trackers are good candidates to unify fast execution and accuracy with competitive performance.
arXiv Detail & Related papers (2022-08-04T11:09:19Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - Multi-Object Tracking and Segmentation with a Space-Time Memory Network [12.043574473965318]
We propose a method for multi-object tracking and segmentation based on a novel memory-based mechanism to associate tracklets.
The proposed tracker, MeNToS, addresses particularly the long-term data association problem.
arXiv Detail & Related papers (2021-10-21T17:13:17Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.