Do Different Tracking Tasks Require Different Appearance Models?
- URL: http://arxiv.org/abs/2107.02156v1
- Date: Mon, 5 Jul 2021 17:40:17 GMT
- Title: Do Different Tracking Tasks Require Different Appearance Models?
- Authors: Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H.S.
Torr, Luca Bertinetto
- Abstract summary: We present UniTrack, a unified tracking solution to address five different tasks within the same framework.
UniTrack consists of a single and task-agnostic appearance model, which can be learned in a supervised or self-supervised fashion.
We show how most tracking tasks can be solved within this framework, and that the same appearance model can be used to obtain performance that is competitive against specialised methods for all the five tasks considered.
- Score: 118.02175542476367
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Tracking objects of interest in a video is one of the most popular and widely
applicable problems in computer vision. However, with the years, a Cambrian
explosion of use cases and benchmarks has fragmented the problem in a multitude
of different experimental setups. As a consequence, the literature has
fragmented too, and now the novel approaches proposed by the community are
usually specialised to fit only one specific setup. To understand to what
extent this specialisation is actually necessary, in this work we present
UniTrack, a unified tracking solution to address five different tasks within
the same framework. UniTrack consists of a single and task-agnostic appearance
model, which can be learned in a supervised or self-supervised fashion, and
multiple "heads" to address individual tasks and that do not require training.
We show how most tracking tasks can be solved within this framework, and that
the same appearance model can be used to obtain performance that is competitive
against specialised methods for all the five tasks considered. The framework
also allows us to analyse appearance models obtained with the most recent
self-supervised methods, thus significantly extending their evaluation and
comparison to a larger variety of important problems. Code available at
https://github.com/Zhongdao/UniTrack.
Related papers
- Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
Tasks [129.49630356651454]
We propose a novel FAshion-focused Multi-task Efficient learning method for Vision-and-Language tasks (FAME-ViL)
Our FAME-ViL can save 61.5% of parameters over alternatives, while significantly outperforming the conventional independently trained single-task models.
arXiv Detail & Related papers (2023-03-04T19:07:48Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Multi-target tracking for video surveillance using deep affinity
network: a brief review [0.0]
Multi-target tracking (MTT) for video surveillance is one of the important and challenging tasks.
Deep learning models are known to function like the human brain.
arXiv Detail & Related papers (2021-10-29T10:44:26Z) - Discriminative Appearance Modeling with Multi-track Pooling for
Real-time Multi-object Tracking [20.66906781151]
In multi-object tracking, the tracker maintains in its memory the appearance and motion information for each object in the scene.
Many approaches model each target in isolation and lack the ability to use all the targets in the scene to jointly update the memory.
We propose a training strategy adapted to multi-track pooling which generates hard tracking episodes online.
arXiv Detail & Related papers (2021-01-28T18:12:39Z) - Assisting Scene Graph Generation with Self-Supervision [21.89909688056478]
We propose a set of three novel yet simple self-supervision tasks and train them as auxiliary multi-tasks to the main model.
While comparing, we train the base-model from scratch with these self-supervision tasks, we achieve state-of-the-art results in all the metrics and recall settings.
arXiv Detail & Related papers (2020-08-08T16:38:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.