Tracking Anything in High Quality
- URL: http://arxiv.org/abs/2307.13974v1
- Date: Wed, 26 Jul 2023 06:19:46 GMT
- Title: Tracking Anything in High Quality
- Authors: Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang,
Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li
- Abstract summary: HQTrack is a framework for High Quality Tracking anything in videos.
It consists of a video multi-object segmenter (VMOS) and a mask refiner (MR)
- Score: 63.63653185865726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual object tracking is a fundamental video task in computer vision.
Recently, the notably increasing power of perception algorithms allows the
unification of single/multiobject and box/mask-based tracking. Among them, the
Segment Anything Model (SAM) attracts much attention. In this report, we
propose HQTrack, a framework for High Quality Tracking anything in videos.
HQTrack mainly consists of a video multi-object segmenter (VMOS) and a mask
refiner (MR). Given the object to be tracked in the initial frame of a video,
VMOS propagates the object masks to the current frame. The mask results at this
stage are not accurate enough since VMOS is trained on several closeset video
object segmentation (VOS) datasets, which has limited ability to generalize to
complex and corner scenes. To further improve the quality of tracking masks, a
pretrained MR model is employed to refine the tracking results. As a compelling
testament to the effectiveness of our paradigm, without employing any tricks
such as test-time data augmentations and model ensemble, HQTrack ranks the 2nd
place in the Visual Object Tracking and Segmentation (VOTS2023) challenge. Code
and models are available at https://github.com/jiawen-zhu/HQTrack.
Related papers
- Tracking Reflected Objects: A Benchmark [12.770787846444406]
We introduce TRO, a benchmark specifically for Tracking Reflected Objects.
TRO includes 200 sequences with around 70,000 frames, each carefully annotated with bounding boxes.
To provide a stronger baseline, we propose a new tracker, HiP-HaTrack, which uses hierarchical features to improve performance.
arXiv Detail & Related papers (2024-07-07T02:22:45Z) - TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos [11.35998213546475]
Multi-object tracking (MOT) is a critical and challenging task in computer vision.
We introduce TeamTrack, a pioneering benchmark dataset specifically designed for MOT in sports.
TeamTrack is an extensive collection of full-pitch video data from various sports, including soccer, basketball, and handball.
arXiv Detail & Related papers (2024-04-22T04:33:40Z) - Tracking with Human-Intent Reasoning [64.69229729784008]
This work proposes a new tracking task -- Instruction Tracking.
It involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames.
TrackGPT is capable of performing complex reasoning-based tracking.
arXiv Detail & Related papers (2023-12-29T03:22:18Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - Simple Cues Lead to a Strong Multi-Object Tracker [3.7189423451031356]
We propose a new type of tracking-by-detection (TbD) for Multi-Object Tracking.
We show that a combination of our appearance features with a simple motion model leads to strong tracking results.
Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-06-09T17:55:51Z) - Robust Visual Tracking by Segmentation [103.87369380021441]
Estimating the target extent poses a fundamental challenge in visual object tracking.
We propose a segmentation-centric tracking pipeline that produces a highly accurate segmentation mask.
Our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content.
arXiv Detail & Related papers (2022-03-21T17:59:19Z) - Is First Person Vision Challenging for Object Tracking? [32.64792520537041]
We present the first systematic study of object tracking in First Person Vision (FPV)
Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure.
Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem.
arXiv Detail & Related papers (2021-08-31T08:06:01Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.