Related papers: CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking

CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking

URL: http://arxiv.org/abs/2511.17967v1
Date: Sat, 22 Nov 2025 08:10:02 GMT
Title: CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking
Authors: Hao Li, Yuhao Wang, Xiantao Hu, Wenning Hao, Pingping Zhang, Dong Wang, Huchuan Lu,
Abstract summary: RGB-Thermal (RGBT) tracking aims to exploit visible and thermal infrared modalities for robust all-weather object tracking.<n>Existing RGBT trackers struggle to resolve modality discrepancies, which poses great challenges for robust feature representation.<n>We propose a novel Contextual Aggregation with Deformable Alignment framework called CADTrack for RGBT Tracking.
Score: 68.71826342377004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: RGB-Thermal (RGBT) tracking aims to exploit visible and thermal infrared modalities for robust all-weather object tracking. However, existing RGBT trackers struggle to resolve modality discrepancies, which poses great challenges for robust feature representation. This limitation hinders effective cross-modal information propagation and fusion, which significantly reduces the tracking accuracy. To address this limitation, we propose a novel Contextual Aggregation with Deformable Alignment framework called CADTrack for RGBT Tracking. To be specific, we first deploy the Mamba-based Feature Interaction (MFI) that establishes efficient feature interaction via state space models. This interaction module can operate with linear complexity, reducing computational cost and improving feature discrimination. Then, we propose the Contextual Aggregation Module (CAM) that dynamically activates backbone layers through sparse gating based on the Mixture-of-Experts (MoE). This module can encode complementary contextual information from cross-layer features. Finally, we propose the Deformable Alignment Module (DAM) to integrate deformable sampling and temporal propagation, mitigating spatial misalignment and localization drift. With the above components, our CADTrack achieves robust and accurate tracking in complex scenarios. Extensive experiments on five RGBT tracking benchmarks verify the effectiveness of our proposed method. The source code is released at https://github.com/IdolLab/CADTrack.

Related papers

RAGTrack: Language-aware RGBT Tracking with Retrieval-Augmented Generation [71.2136732268131]
RGB-Thermal (RGBT) tracking aims to achieve robust object localization across diverse environmental conditions.<n>Existing RGBT trackers rely solely on initial-frame visual information for target modeling.<n>We propose RAGTrack, a novel Retrieval-Augmented Generation framework for robust RGBT tracking.
arXiv Detail & Related papers (2026-03-04T01:02:04Z)
Breaking Alignment Barriers: TPS-Driven Semantic Correlation Learning for Alignment-Free RGB-T Salient Object Detection [34.62005077259452]
Existing RGB-T salient object detection methods rely on manually aligned and annotated datasets.<n>We propose an efficient RGB-T SOD method for real-world unaligned image pairs, termed Thin-Plate Spline-driven Semantic Correlation Learning Network (TPS-SCL)<n>TPS-SCL attains state-of-the-art (SOTA) performance among existing lightweight SOD methods and outperforms mainstream RGB-T SOD approaches.
arXiv Detail & Related papers (2025-12-26T04:37:49Z)
SwiTrack: Tri-State Switch for Cross-Modal Object Tracking [74.15663758681849]
Cross-modal object tracking (CMOT) is an emerging task that maintains target consistency while the video stream switches between different modalities.<n>We propose SwiTrack, a novel state-switching framework that redefines CMOT through the deployment of three specialized streams.
arXiv Detail & Related papers (2025-11-20T10:52:54Z)
Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking [9.353589376846902]
We propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network.<n>The source code and pre-trained models will be released at https://github.com/Event-AHU/Mamba_FETrack.
arXiv Detail & Related papers (2025-06-30T12:24:01Z)
CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z)
CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras [43.699819213559515]
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 times 260$) is low for practical applications. We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system. We propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data.
arXiv Detail & Related papers (2024-01-05T14:20:22Z)
iKUN: Speak to Trackers without Retraining [21.555469501789577]
We propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers. To improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise. We also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions.
arXiv Detail & Related papers (2023-12-25T11:48:55Z)
Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks [21.139161163767884]
Modal information might miss due to factors such as thermal sensor self-calibration and data transmission error.<n>We propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model.<n>Our method achieves significant performance improvements compared with state-of-the-art methods.
arXiv Detail & Related papers (2023-12-25T11:39:00Z)
Transparent Object Tracking with Enhanced Fusion Module [56.403878717170784]
We propose a new tracker architecture that uses our fusion techniques to achieve superior results for transparent object tracking. Our results and the implementation of code will be made publicly available at https://github.com/kalyan05TOTEM.
arXiv Detail & Related papers (2023-09-13T03:52:09Z)
Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z)
Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers. This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.