Hard Occlusions in Visual Object Tracking
- URL: http://arxiv.org/abs/2009.04787v1
- Date: Thu, 10 Sep 2020 11:42:21 GMT
- Title: Hard Occlusions in Visual Object Tracking
- Authors: Thijs P. Kuipers, Devanshu Arya, Deepak K. Gupta
- Abstract summary: We benchmark the performance of recent state-of-the-art trackers (SOTA) on hard scenarios such as in-plane rotation.
Results show that hard occlusions remain a very challenging problem for SOTA trackers.
The varying nature of tracker performance based on specific categories suggests that the common tracker rankings using averaged single performance scores are not adequate to gauge tracker performance in real-world scenarios.
- Score: 12.502821224144151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual object tracking is among the hardest problems in computer vision, as
trackers have to deal with many challenging circumstances such as illumination
changes, fast motion, occlusion, among others. A tracker is assessed to be good
or not based on its performance on the recent tracking datasets, e.g., VOT2019,
and LaSOT. We argue that while the recent datasets contain large sets of
annotated videos that to some extent provide a large bandwidth for training
data, the hard scenarios such as occlusion and in-plane rotation are still
underrepresented. For trackers to be brought closer to the real-world scenarios
and deployed in safety-critical devices, even the rarest hard scenarios must be
properly addressed. In this paper, we particularly focus on hard occlusion
cases and benchmark the performance of recent state-of-the-art trackers (SOTA)
on them. We created a small-scale dataset containing different categories
within hard occlusions, on which the selected trackers are evaluated. Results
show that hard occlusions remain a very challenging problem for SOTA trackers.
Furthermore, it is observed that tracker performance varies wildly between
different categories of hard occlusions, where a top-performing tracker on one
category performs significantly worse on a different category. The varying
nature of tracker performance based on specific categories suggests that the
common tracker rankings using averaged single performance scores are not
adequate to gauge tracker performance in real-world scenarios.
Related papers
- Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - Tracking Reflected Objects: A Benchmark [12.770787846444406]
We introduce TRO, a benchmark specifically for Tracking Reflected Objects.
TRO includes 200 sequences with around 70,000 frames, each carefully annotated with bounding boxes.
To provide a stronger baseline, we propose a new tracker, HiP-HaTrack, which uses hierarchical features to improve performance.
arXiv Detail & Related papers (2024-07-07T02:22:45Z) - Large Scale Real-World Multi-Person Tracking [68.27438015329807]
This paper presents a new large scale multi-person tracking dataset -- textttPersonPath22.
It is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20.
arXiv Detail & Related papers (2022-11-03T23:03:13Z) - AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility [125.77396380698639]
AVisT is a benchmark for visual tracking in diverse scenarios with adverse visibility.
AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios.
We benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes.
arXiv Detail & Related papers (2022-08-14T17:49:37Z) - Tracking Every Thing in the Wild [61.917043381836656]
We introduce a new metric, Track Every Thing Accuracy (TETA), breaking tracking measurement into three sub-factors: localization, association, and classification.
Our experiments show that TETA evaluates trackers more comprehensively, and TETer achieves significant improvements on the challenging large-scale datasets BDD100K and TAO.
arXiv Detail & Related papers (2022-07-26T15:37:19Z) - DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse
Motion [56.1428110894411]
We propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation.
As the dataset contains mostly group dancing videos, we name it "DanceTrack"
We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks.
arXiv Detail & Related papers (2021-11-29T16:49:06Z) - Track without Appearance: Learn Box and Tracklet Embedding with Local
and Global Motion Patterns for Vehicle Tracking [45.524183249765244]
Vehicle tracking is an essential task in the multi-object tracking (MOT) field.
In this paper, we try to explore the significance of motion patterns for vehicle tracking without appearance information.
We propose a novel approach that tackles the association issue for long-term tracking with the exclusive fully-exploited motion information.
arXiv Detail & Related papers (2021-08-13T02:27:09Z) - Benchmarking Deep Trackers on Aerial Videos [5.414308305392762]
In this paper, we compare ten trackers based on deep learning techniques on four aerial datasets.
We choose top performing trackers utilizing different approaches, specifically tracking by detection, discriminative correlation filters, Siamese networks and reinforcement learning.
Our findings indicate that the trackers perform significantly worse in aerial datasets compared to standard ground level videos.
arXiv Detail & Related papers (2021-03-24T01:45:19Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - Rethinking Convolutional Features in Correlation Filter Based Tracking [0.0]
We revisit a hierarchical deep feature-based visual tracker and find that both the performance and efficiency of the deep tracker are limited by the poor feature quality.
After removing redundant features, our proposed tracker achieves significant improvements in both performance and efficiency.
arXiv Detail & Related papers (2019-12-30T04:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.