Towards Real-World Visual Tracking with Temporal Contexts
- URL: http://arxiv.org/abs/2308.10330v1
- Date: Sun, 20 Aug 2023 17:59:40 GMT
- Title: Towards Real-World Visual Tracking with Temporal Contexts
- Authors: Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong
Fu
- Abstract summary: We propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.
Based on it, we propose a stronger version for real-world visual tracking, i.e., TCTrack++.
For feature extraction, we propose an attention-based temporally adaptive convolution to enhance the spatial features.
For similarity map refinement, we introduce an adaptive temporal transformer to encode the temporal knowledge efficiently.
- Score: 64.7981374129495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual tracking has made significant improvements in the past few decades.
Most existing state-of-the-art trackers 1) merely aim for performance in ideal
conditions while overlooking the real-world conditions; 2) adopt the
tracking-by-detection paradigm, neglecting rich temporal contexts; 3) only
integrate the temporal information into the template, where temporal contexts
among consecutive frames are far from being fully utilized. To handle those
problems, we propose a two-level framework (TCTrack) that can exploit temporal
contexts efficiently. Based on it, we propose a stronger version for real-world
visual tracking, i.e., TCTrack++. It boils down to two levels: features and
similarity maps. Specifically, for feature extraction, we propose an
attention-based temporally adaptive convolution to enhance the spatial features
using temporal information, which is achieved by dynamically calibrating the
convolution weights. For similarity map refinement, we introduce an adaptive
temporal transformer to encode the temporal knowledge efficiently and decode it
for the accurate refinement of the similarity map. To further improve the
performance, we additionally introduce a curriculum learning strategy. Also, we
adopt online evaluation to measure performance in real-world conditions.
Exhaustive experiments on 8 wellknown benchmarks demonstrate the superiority of
TCTrack++. Real-world tests directly verify that TCTrack++ can be readily used
in real-world applications.
Related papers
- Local All-Pair Correspondence for Point Tracking [59.76186266230608]
We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences.
LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.
arXiv Detail & Related papers (2024-07-22T06:49:56Z) - Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking [0.5371337604556311]
Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT)
Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked.
In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking.
We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
arXiv Detail & Related papers (2024-02-27T07:34:08Z) - Temporal Adaptive RGBT Tracking with Modality Prompt [10.431364270734331]
RGBT tracking has been widely used in various fields such as robotics, processing, surveillance, and autonomous driving.
Existing RGBT trackers fully explore the spatial information between the template and the search region and locate the target based on the appearance matching results.
These RGBT trackers have very limited exploitation of temporal information, either ignoring temporal information or exploiting it through online sampling and training.
arXiv Detail & Related papers (2024-01-02T15:20:50Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - TCTrack: Temporal Contexts for Aerial Tracking [38.87248176223548]
TCTrack is a comprehensive framework to fully exploit temporal contexts for aerial tracking.
For feature extraction, an online temporally adaptive convolution is proposed to enhance the spatial features.
For similarity map refinement, we propose an adaptive temporal transformer, which first effectively encodes temporal knowledge in a memory-efficient way.
arXiv Detail & Related papers (2022-03-03T18:04:20Z) - Predictive Visual Tracking: A New Benchmark and Baseline Approach [27.87099869398515]
In the real-world scenarios, the onboard processing time of the image streams inevitably leads to a discrepancy between the tracking results and the real-world states.
Existing visual tracking benchmarks commonly run the trackers offline and ignore such latency in the evaluation.
In this work, we aim to deal with a more realistic problem of latency-aware tracking.
arXiv Detail & Related papers (2021-03-08T01:50:05Z) - Deep Learning based Virtual Point Tracking for Real-Time Target-less
Dynamic Displacement Measurement in Railway Applications [0.0]
We propose virtual point tracking for real-time target-less dynamic displacement measurement, incorporating deep learning techniques and domain knowledge.
We demonstrate our approach for a railway application, where the lateral displacement of the wheel on the rail is measured during operation.
arXiv Detail & Related papers (2021-01-17T16:19:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.