Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework
- URL: http://arxiv.org/abs/2203.11991v2
- Date: Thu, 24 Mar 2022 11:39:35 GMT
- Title: Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework
- Authors: Botao Ye, Hong Chang, Bingpeng Ma, and Shiguang Shan
- Abstract summary: We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
- Score: 76.70603443624012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current popular two-stream, two-stage tracking framework extracts the
template and the search region features separately and then performs relation
modeling, thus the extracted features lack the awareness of the target and have
limited target-background discriminability. To tackle the above issue, we
propose a novel one-stream tracking (OSTrack) framework that unifies feature
learning and relation modeling by bridging the template-search image pairs with
bidirectional information flows. In this way, discriminative target-oriented
features can be dynamically extracted by mutual guidance. Since no extra heavy
relation modeling module is needed and the implementation is highly
parallelized, the proposed tracker runs at a fast speed. To further improve the
inference efficiency, an in-network candidate early elimination module is
proposed based on the strong similarity prior calculated in the one-stream
framework. As a unified framework, OSTrack achieves state-of-the-art
performance on multiple benchmarks, in particular, it shows impressive results
on the one-shot tracking benchmark GOT-10k, i.e., achieving 73.7% AO, improving
the existing best result (SwinTrack) by 4.3%. Besides, our method maintains a
good performance-speed trade-off and shows faster convergence. The code and
models will be available at https://github.com/botaoye/OSTrack.
Related papers
- Multi-object Tracking by Detection and Query: an efficient end-to-end manner [23.926668750263488]
Multi-object tracking is advancing through two dominant paradigms: traditional tracking by detection and newly emerging tracking by query.
We propose the tracking-by-detection-and-query paradigm, which is achieved by a Learnable Associator.
Compared to tracking-by-query models, LAID achieves competitive tracking accuracy with notably higher training efficiency.
arXiv Detail & Related papers (2024-11-09T14:38:08Z) - Hierarchical IoU Tracking based on Interval [21.555469501789577]
Multi-Object Tracking (MOT) aims to detect and associate all targets of given classes across frames.
We propose the Hierarchical IoU Tracking framework, dubbed HIT, which achieves unified hierarchical tracking by utilizing tracklet intervals as priors.
Our method achieves promising performance on four datasets, i.e., MOT17, KITTI, DanceTrack and VisDrone.
arXiv Detail & Related papers (2024-06-19T07:03:18Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector.
It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - DSRRTracker: Dynamic Search Region Refinement for Attention-based
Siamese Multi-Object Tracking [13.104037155691644]
We propose an end-to-end MOT method, with a Gaussian filter-inspired dynamic search region refinement module.
Our method can achieve the state-of-the-art performance with reasonable speed.
arXiv Detail & Related papers (2022-03-21T04:14:06Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.