Learning Target Candidate Association to Keep Track of What Not to Track
- URL: http://arxiv.org/abs/2103.16556v1
- Date: Tue, 30 Mar 2021 17:58:02 GMT
- Title: Learning Target Candidate Association to Keep Track of What Not to Track
- Authors: Christoph Mayer, Martin Danelljan, Danda Pani Paudel, Luc Van Gool
- Abstract summary: We propose to keep track of distractor objects in order to continue tracking the target.
To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision.
Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.2% on LaSOT and a +6.1% absolute gain on the OxUvA long-term dataset.
- Score: 100.80610986625693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The presence of objects that are confusingly similar to the tracked target,
poses a fundamental challenge in appearance-based visual tracking. Such
distractor objects are easily misclassified as the target itself, leading to
eventual tracking failure. While most methods strive to suppress distractors
through more powerful appearance models, we take an alternative approach.
We propose to keep track of distractor objects in order to continue tracking
the target. To this end, we introduce a learned association network, allowing
us to propagate the identities of all target candidates from frame-to-frame. To
tackle the problem of lacking ground-truth correspondences between distractor
objects in visual tracking, we propose a training strategy that combines
partial annotations with self-supervision. We conduct comprehensive
experimental validation and analysis of our approach on several challenging
datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving
an AUC score of 67.2% on LaSOT and a +6.1% absolute gain on the OxUvA long-term
dataset.
Related papers
- VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors [16.84474849409625]
We propose a framework for consistently producing high-quality object tracks.
The key idea is to tailor a module for each dataset to intelligently decide when an object tracker is failing.
Our approach leverages self-supervised learning on unlabeled videos to learn a tailored representation for a target object.
arXiv Detail & Related papers (2024-05-06T17:06:32Z) - RTrack: Accelerating Convergence for Visual Object Tracking via
Pseudo-Boxes Exploration [3.29854706649876]
Single object tracking (SOT) heavily relies on the representation of the target object as a bounding box.
This paper proposes RTrack, a novel object representation baseline tracker.
RTrack automatically arranges points to define the spatial extents and highlight local areas.
arXiv Detail & Related papers (2023-09-23T04:41:59Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - Temporally-Transferable Perturbations: Efficient, One-Shot Adversarial
Attacks for Online Visual Object Trackers [81.90113217334424]
We propose a framework to generate a single temporally transferable adversarial perturbation from the object template image only.
This perturbation can then be added to every search image, which comes at virtually no cost, and still, successfully fool the tracker.
arXiv Detail & Related papers (2020-12-30T15:05:53Z) - Detecting Invisible People [58.49425715635312]
We re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects.
We demonstrate that current detection and tracking systems perform dramatically worse on this task.
Second, we build dynamic models that explicitly reason in 3D, making use of observations produced by state-of-the-art monocular depth estimation networks.
arXiv Detail & Related papers (2020-12-15T16:54:45Z) - Blending of Learning-based Tracking and Object Detection for Monocular
Camera-based Target Following [2.578242050187029]
We present a real-time approach which fuses a generic target tracker and object detection module with a target re-identification module.
Our work focuses on improving the performance of Convolutional Recurrent Neural Network-based object trackers in cases where the object of interest belongs to the category of emphfamiliar objects.
arXiv Detail & Related papers (2020-08-21T18:44:35Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z) - COMET: Context-Aware IoU-Guided Network for Small Object Tracking [17.387332692494084]
We introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy.
The proposed network fully exploits target-related information by multi-scale feature learning and attention modules.
Empirically, COMET outperforms the state-of-the-arts in a range of aerial view datasets that focusing on tracking small objects.
arXiv Detail & Related papers (2020-06-04T00:28:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.