Multi-attention Associate Prediction Network for Visual Tracking
- URL: http://arxiv.org/abs/2403.16395v1
- Date: Mon, 25 Mar 2024 03:18:58 GMT
- Title: Multi-attention Associate Prediction Network for Visual Tracking
- Authors: Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Xilai Wei, Zhonghe Hu,
- Abstract summary: classification-regression prediction networks have realized impressive success in several modern deep trackers.
There is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching.
We propose a multi-attention associate prediction network (MAPNet) to tackle the above problems.
- Score: 3.9628431811908533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key issue and only employ a unified matching block in two task branches, decaying the decision quality. Besides, these models also struggle with decision misalignment situation. In this paper, we propose a multi-attention associate prediction network (MAPNet) to tackle the above problems. Concretely, two novel matchers, i.e., category-aware matcher and spatial-aware matcher, are first designed for feature comparison by integrating self, cross, channel or spatial attentions organically. They are capable of fully capturing the category-related semantics for classification and the local spatial contexts for regression, respectively. Then, we present a dual alignment module to enhance the correspondences between two branches, which is useful to find the optimal tracking solution. Finally, we describe a Siamese tracker built upon the proposed prediction network, which achieves the leading performance on five tracking benchmarks, consisting of LaSOT, TrackingNet, GOT-10k, TNL2k and UAV123, and surpasses other state-of-the-art approaches.
Related papers
- SiamTHN: Siamese Target Highlight Network for Visual Tracking [11.111738354621595]
Siamese network based trackers treat each channel in the feature maps generated by the backbone network equally.
No structural links between the classification and regression branches in these trackers, and the two branches are optimized separately during training.
A Target Highlight Module is proposed to help the generated similarity response maps to be more focused on the target region.
arXiv Detail & Related papers (2023-03-22T04:33:02Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Online Multiple Object Tracking with Cross-Task Synergy [120.70085565030628]
We propose a novel unified model with synergy between position prediction and embedding association.
The two tasks are linked by temporal-aware target attention and distractor attention, as well as identity-aware memory aggregation model.
arXiv Detail & Related papers (2021-04-01T10:19:40Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - Multiple Convolutional Features in Siamese Networks for Object Tracking [13.850110645060116]
Multiple Features-Siamese Tracker (MFST) is a novel tracking algorithm exploiting several hierarchical feature maps for robust tracking.
MFST achieves high tracking accuracy, while outperforming the standard siamese tracker on object tracking benchmarks.
arXiv Detail & Related papers (2021-03-01T08:02:27Z) - CRACT: Cascaded Regression-Align-Classification for Robust Visual
Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC)
CRAC yields new state-of-the-art performances on many benchmarks.
In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z) - Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking.
Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z) - Rethinking the competition between detection and ReID in Multi-Object
Tracking [44.59367033562385]
One-shot models which jointly learn detection and identification embeddings, have drawn great attention in multi-object tracking (MOT)
In this paper, we propose a novel reciprocal network (REN) with a self-relation and cross-relation design to better learn task-dependent representations.
We also introduce a scale-aware attention network (SAAN) that prevents semantic level misalignment to improve the association capability of ID embeddings.
arXiv Detail & Related papers (2020-10-23T02:44:59Z) - MOTS: Multiple Object Tracking for General Categories Based On Few-Shot
Method [38.009864162410615]
A new multi-target tracking system, named MOTS, is based on metrics but not limited to track specific category.
With a newly built TRACK-REID data-set, the Fine-match Network can perform matching of 31 category targets, even generalizes to unseen categories.
arXiv Detail & Related papers (2020-05-19T02:18:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.