Related papers: Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

URL: http://arxiv.org/abs/2407.08265v2
Date: Thu, 18 Jul 2024 08:53:00 GMT
Title: Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation
Authors: Miao Yan, Ping Zhang, Haofei Zhang, Ruqian Hao, Juanxiu Liu, Xiaoyang Wang, Lin Liu,
Abstract summary: We propose a novel model called NLMTrack, which enhances the utilization of coordinate and temporal information. Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks.
Score: 16.873697155916997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking that lacks texture and color information. In this paper, to address these issues, we apply natural language modeling to TIR tracking and propose a novel model called NLMTrack, which enhances the utilization of coordinate and temporal information. NLMTrack applies an encoder that unifies feature extraction and feature fusion, which simplifies the TIR tracking pipeline. To address the challenge of low detail and low contrast in TIR images, on the one hand, we design a multi-level progressive fusion module that enhances the semantic representation and incorporates multi-scale features. On the other hand, the decoder combines the TIR features and the coordinate sequence features using a causal transformer to generate the target sequence step by step. Moreover, we explore an adaptive loss aimed at elevating tracking accuracy and a simple template update strategy to accommodate the target's appearance variations. Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks. The Code is publicly available at \url{https://github.com/ELOESZHANG/NLMTrack}.

Related papers

CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z)
SMTT: Novel Structured Multi-task Tracking with Graph-Regularized Sparse Representation for Robust Thermal Infrared Target Tracking [8.52497147463548]
Thermal infrared target tracking is crucial in applications such as surveillance, autonomous driving, and military operations. In this paper, we propose a novel tracker, SMTT, which effectively addresses common challenges in thermal infrared imagery.
arXiv Detail & Related papers (2025-04-20T10:56:15Z)
STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT) We use historical embedding features to model the representation of ReID and detection features in a sequential order. Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z)
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking [64.28025685503376]
SeqTrack casts visual tracking as a sequence generation task, forecasting object bounding boxes in an autoregressive manner. SeqTrackv2 integrates a unified interface for auxiliary modalities and a set of task-prompt tokens to specify the task. This sequence learning paradigm not only simplifies the tracking framework, but also showcases superior performance across 14 challenging benchmarks.
arXiv Detail & Related papers (2023-04-27T17:56:29Z)
RGB-T Tracking Based on Mixed Attention [5.151994214135177]
RGB-T tracking involves the use of images from both visible and thermal modalities. An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
arXiv Detail & Related papers (2023-04-09T15:59:41Z)
Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation. We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z)
Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning. Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z)
MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data. We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively. To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z)
TrTr: Visual Tracking with Transformer [29.415900191169587]
We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture. We design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. Our method performs favorably against state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-09T02:32:28Z)
TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video. TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy. The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z)
Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers. This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)
Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities. When the appearance cue is unreliable, we take motion cues into account to make the tracker robust. Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.