Spatio-Temporal Matching for Siamese Visual Tracking
- URL: http://arxiv.org/abs/2105.02408v1
- Date: Thu, 6 May 2021 02:55:58 GMT
- Title: Spatio-Temporal Matching for Siamese Visual Tracking
- Authors: Jinpu Zhang and Yuehuan Wang
- Abstract summary: Similarity matching is a core operation in Siamese trackers.
Unlike 2-D image matching, the matching network in object tracking requires 4-D information (height, width, channel and time)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Similarity matching is a core operation in Siamese trackers. Most Siamese
trackers carry out similarity learning via cross correlation that originates
from the image matching field. However, unlike 2-D image matching, the matching
network in object tracking requires 4-D information (height, width, channel and
time). Cross correlation neglects the information from channel and time
dimensions, and thus produces ambiguous matching. This paper proposes a
spatio-temporal matching process to thoroughly explore the capability of 4-D
matching in space (height, width and channel) and time. In spatial matching, we
introduce a space-variant channel-guided correlation (SVC-Corr) to recalibrate
channel-wise feature responses for each spatial location, which can guide the
generation of the target-aware matching features. In temporal matching, we
investigate the time-domain context relations of the target and the background
and develop an aberrance repressed module (ARM). By restricting the abrupt
alteration in the interframe response maps, our ARM can clearly suppress
aberrances and thus enables more robust and accurate object tracking.
Furthermore, a novel anchor-free tracking framework is presented to accommodate
these innovations. Experiments on challenging benchmarks including OTB100,
VOT2018, VOT2020, GOT-10k, and LaSOT demonstrate the state-of-the-art
performance of the proposed method.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - Local All-Pair Correspondence for Point Tracking [59.76186266230608]
We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences.
LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.
arXiv Detail & Related papers (2024-07-22T06:49:56Z) - A CNN-LSTM Architecture for Marine Vessel Track Association Using
Automatic Identification System (AIS) Data [2.094022863940315]
This study introduces a 1D CNN-LSTM architecture-based framework for track association.
The proposed framework takes the marine vessel's location and motion data collected through the Automatic Identification System (AIS) as input and returns the most likely vessel track as output in real-time.
arXiv Detail & Related papers (2023-03-24T15:26:49Z) - Learning Appearance-motion Normality for Video Anomaly Detection [11.658792932975652]
We propose spatial-temporal memories augmented two-stream auto-encoder framework.
It learns the appearance normality and motion normality independently and explores the correlations via adversarial learning.
Our framework outperforms the state-of-the-art methods, achieving AUCs of 98.1% and 89.8% on UCSD Ped2 and CUHK Avenue datasets.
arXiv Detail & Related papers (2022-07-27T08:30:19Z) - Adaptive Affinity for Associations in Multi-Target Multi-Camera Tracking [53.668757725179056]
We propose a simple yet effective approach to adapt affinity estimations to corresponding matching scopes in MTMCT.
Instead of trying to deal with all appearance changes, we tailor the affinity metric to specialize in ones that might emerge during data associations.
Minimizing the mismatch, the adaptive affinity module brings significant improvements over global re-ID distance.
arXiv Detail & Related papers (2021-12-14T18:59:11Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers.
This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention.
Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.