Temporal Adaptive RGBT Tracking with Modality Prompt
- URL: http://arxiv.org/abs/2401.01244v1
- Date: Tue, 2 Jan 2024 15:20:50 GMT
- Title: Temporal Adaptive RGBT Tracking with Modality Prompt
- Authors: Hongyu Wang, Xiaotao Liu, Yifan Li, Meng Sun, Dian Yuan, Jing Liu
- Abstract summary: RGBT tracking has been widely used in various fields such as robotics, processing, surveillance, and autonomous driving.
Existing RGBT trackers fully explore the spatial information between the template and the search region and locate the target based on the appearance matching results.
These RGBT trackers have very limited exploitation of temporal information, either ignoring temporal information or exploiting it through online sampling and training.
- Score: 10.431364270734331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGBT tracking has been widely used in various fields such as robotics,
surveillance processing, and autonomous driving. Existing RGBT trackers fully
explore the spatial information between the template and the search region and
locate the target based on the appearance matching results. However, these RGBT
trackers have very limited exploitation of temporal information, either
ignoring temporal information or exploiting it through online sampling and
training. The former struggles to cope with the object state changes, while the
latter neglects the correlation between spatial and temporal information. To
alleviate these limitations, we propose a novel Temporal Adaptive RGBT Tracking
framework, named as TATrack. TATrack has a spatio-temporal two-stream structure
and captures temporal information by an online updated template, where the
two-stream structure refers to the multi-modal feature extraction and
cross-modal interaction for the initial template and the online update template
respectively. TATrack contributes to comprehensively exploit spatio-temporal
information and multi-modal information for target localization. In addition,
we design a spatio-temporal interaction (STI) mechanism that bridges two
branches and enables cross-modal interaction to span longer time scales.
Extensive experiments on three popular RGBT tracking benchmarks show that our
method achieves state-of-the-art performance, while running at real-time speed.
Related papers
- Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking.
Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z) - Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z) - Transformer RGBT Tracking with Spatio-Temporal Multimodal Tokens [13.608089918718797]
We propose a novel Transformer-T tracking approach, which mixes multimodal tokens from static templates and multimodal search Transformer to handle target appearance changes.
Our module is inserted into the transformer network and inherits joint feature extraction, searchtemplate matching, and cross-temporal interaction.
Experiments on three RGBT benchmark datasets show that the proposed approach maintains competitive performance compared to other state-of-the-art tracking algorithms.
arXiv Detail & Related papers (2024-01-03T11:16:38Z) - Towards Real-World Visual Tracking with Temporal Contexts [64.7981374129495]
We propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.
Based on it, we propose a stronger version for real-world visual tracking, i.e., TCTrack++.
For feature extraction, we propose an attention-based temporally adaptive convolution to enhance the spatial features.
For similarity map refinement, we introduce an adaptive temporal transformer to encode the temporal knowledge efficiently.
arXiv Detail & Related papers (2023-08-20T17:59:40Z) - Tracking Objects and Activities with Attention for Temporal Sentence
Grounding [51.416914256782505]
Temporal sentence (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed segment.
We propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal and search space, and (B) a Temporal Sentence Tracker to track multi-modal targets' behavior and to predict query-related segment.
arXiv Detail & Related papers (2023-02-21T16:42:52Z) - Automated Dilated Spatio-Temporal Synchronous Graph Modeling for Traffic
Prediction [1.6449390849183363]
We propose an automated dilated-temporal synchronous graph network prediction named Auto-DSTS for traffic prediction.
Specifically, we propose an automated dilated-temporal-temporal graph (Auto-DSTS) module to capture the short-term and long-term-temporal correlations.
Our model can achieve about 10% improvements compared with the state-of-art methods.
arXiv Detail & Related papers (2022-07-22T00:50:39Z) - Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning.
Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z) - Continuity-Discrimination Convolutional Neural Network for Visual Object
Tracking [150.51667609413312]
This paper proposes a novel model, named Continuity-Discrimination Convolutional Neural Network (CD-CNN) for visual object tracking.
To address this problem, CD-CNN models temporal appearance continuity based on the idea of temporal slowness.
In order to alleviate inaccurate target localization and drifting, we propose a novel notion, object-centroid.
arXiv Detail & Related papers (2021-04-18T06:35:03Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.