LaSOT: A High-quality Large-scale Single Object Tracking Benchmark
- URL: http://arxiv.org/abs/2009.03465v3
- Date: Sat, 12 Sep 2020 03:53:45 GMT
- Title: LaSOT: A High-quality Large-scale Single Object Tracking Benchmark
- Authors: Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia
Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan,
Haibin Ling
- Abstract summary: We present LaSOT, a high-quality Large-scale Single Object Tracking benchmark.
LaSOT contains a diverse selection of 85 object classes, and offers 1,550 totaling more than 3.87 million frames.
Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark.
- Score: 67.96196486540497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite great recent advances in visual tracking, its further development,
including both algorithm design and evaluation, is limited due to lack of
dedicated large-scale benchmarks. To address this problem, we present LaSOT, a
high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a
diverse selection of 85 object classes, and offers 1,550 totaling more than
3.87 million frames. Each video frame is carefully and manually annotated with
a bounding box. This makes LaSOT, to our knowledge, the largest densely
annotated tracking benchmark. Our goal in releasing LaSOT is to provide a
dedicated high quality platform for both training and evaluation of trackers.
The average video length of LaSOT is around 2,500 frames, where each video
contains various challenge factors that exist in real world video footage,such
as the targets disappearing and re-appearing. These longer video lengths allow
for the assessment of long-term trackers. To take advantage of the close
connection between visual appearance and natural language, we provide language
specification for each video in LaSOT. We believe such additions will allow for
future research to use linguistic features to improve tracking. Two protocols,
full-overlap and one-shot, are designated for flexible assessment of trackers.
We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis,
and results reveal that there still exists significant room for improvement.
The complete benchmark, tracking results as well as analysis are available at
http://vision.cs.stonybrook.edu/~lasot/.
Related papers
- DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM [23.551036494221222]
We propose a new visual language tracking benchmark with diverse texts, named DTVLT, based on five prominent VLT and SOT benchmarks.
We offer four texts in our benchmark, considering the extent and density of semantic information.
We conduct comprehensive experimental analyses on DTVLT, evaluating the impact of diverse text on tracking performance.
arXiv Detail & Related papers (2024-10-03T13:57:07Z) - VastTrack: Vast Category Visual Object Tracking [39.61339408722333]
We introduce a novel benchmark, dubbed VastTrack, towards facilitating the development of more general visual tracking.
VastTrack covers target objects from 2,115 classes, largely surpassing object categories of existing popular benchmarks.
VastTrack offers 50,610 sequences with 4.2 million frames, which makes it to date the largest benchmark regarding the number of videos.
arXiv Detail & Related papers (2024-03-06T06:39:43Z) - Tracking Anything in High Quality [63.63653185865726]
HQTrack is a framework for High Quality Tracking anything in videos.
It consists of a video multi-object segmenter (VMOS) and a mask refiner (MR)
arXiv Detail & Related papers (2023-07-26T06:19:46Z) - Perception Test: A Diagnostic Benchmark for Multimodal Video Models [78.64546291816117]
We propose a novel multimodal video benchmark to evaluate the perception and reasoning skills of pre-trained multimodal models.
The Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities.
The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime.
arXiv Detail & Related papers (2023-05-23T07:54:37Z) - OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline.
Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to
Better Classify Objects in Videos [36.28269135795851]
We present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet.
By simply attaching our method to QDTrack on top of ResNet-101, we achieve the new state-of-the-art, 19.9% and 15.7% TrackAP_50 on TAO validation and test sets.
arXiv Detail & Related papers (2022-06-05T07:51:58Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Towards More Flexible and Accurate Object Tracking with Natural
Language: Algorithms and Benchmark [46.691218019908746]
Tracking by natural language specification is a new rising research topic that aims at locating the target object in the video sequence based on its language description.
We propose a new benchmark specifically dedicated to the tracking-by-language, including a large scale dataset.
We also introduce two new challenges into TNL2K for the object tracking task, i.e., adversarial samples and modality switch.
arXiv Detail & Related papers (2021-03-31T00:57:32Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.