Related papers: DPTrack:Directional Kernel-Guided Prompt Learning for Robust Nighttime Aerial Tracking

DPTrack:Directional Kernel-Guided Prompt Learning for Robust Nighttime Aerial Tracking

URL: http://arxiv.org/abs/2510.15449v1
Date: Fri, 17 Oct 2025 09:07:19 GMT
Title: DPTrack:Directional Kernel-Guided Prompt Learning for Robust Nighttime Aerial Tracking
Authors: Zhiqiang Zhu, Xinbo Gao, Wen Lu, Jie Li, Zhaoyang Wang, Mingqian Ge,
Abstract summary: DPTrack is a prompt-based aerial tracker for nighttime scenarios.<n>It encodes the given object's attribute features into the directional kernel.<n>A kernel-guided prompt module propagates the kernel across the features of the search region.
Score: 51.02908607542803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing nighttime aerial trackers based on prompt learning rely solely on spatial localization supervision, which fails to provide fine-grained cues that point to target features and inevitably produces vague prompts. This limitation impairs the tracker's ability to accurately focus on the object features and results in trackers still performing poorly. To address this issue, we propose DPTrack, a prompt-based aerial tracker designed for nighttime scenarios by encoding the given object's attribute features into the directional kernel enriched with fine-grained cues to generate precise prompts. Specifically, drawing inspiration from visual bionics, DPTrack first hierarchically captures the object's topological structure, leveraging topological attributes to enrich the feature representation. Subsequently, an encoder condenses these topology-aware features into the directional kernel, which serves as the core guidance signal that explicitly encapsulates the object's fine-grained attribute cues. Finally, a kernel-guided prompt module built on channel-category correspondence attributes propagates the kernel across the features of the search region to pinpoint the positions of target features and convert them into precise prompts, integrating spatial gating for robust nighttime tracking. Extensive evaluations on established benchmarks demonstrate DPTrack's superior performance. Our code will be available at https://github.com/zzq-vipsl/DPTrack.

Related papers

Exploring Temporally-Aware Features for Point Tracking [58.63091479730935]
Chrono is a feature backbone specifically designed for point tracking with built-in temporal awareness.<n>Chrono achieves state-of-the-art performance in a refiner-free setting on the TAP-Vid-DAVIS and TAP-Vid-Kinetics datasets.
arXiv Detail & Related papers (2025-01-21T15:39:40Z)
STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT) We use historical embedding features to model the representation of ReID and detection features in a sequential order. Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z)
Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection [50.959453059206446]
This paper aims for high-performance offline LiDAR-based 3D object detection. We first observe that experienced human annotators annotate objects from a track-centric perspective. We propose a high-performance offline detector in a track-centric perspective instead of the conventional object-centric perspective.
arXiv Detail & Related papers (2023-04-24T17:59:05Z)
Tracking by Joint Local and Global Search: A Target-aware Attention based Approach [63.50045332644818]
We propose a novel target-aware attention mechanism (termed TANet) to conduct joint local and global search for robust tracking. Specifically, we extract the features of target object patch and continuous video frames, then we track and feed them into a decoder network to generate target-aware global attention maps. In the tracking procedure, we integrate the target-aware attention with multiple trackers by exploring candidate search regions for robust tracking.
arXiv Detail & Related papers (2021-06-09T06:54:15Z)
Learning Spatio-Temporal Transformer for Visual Tracking [108.11680070733598]
We present a new tracking architecture with an encoder-decoder transformer as the key component. The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing. The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running real-time speed, being 6x faster than Siam R-CNN.
arXiv Detail & Related papers (2021-03-31T15:19:19Z)
Coarse-to-Fine Object Tracking Using Deep Features and Correlation Filters [2.3526458707956643]
This paper presents a novel deep learning tracking algorithm. We exploit the generalization ability of deep features to coarsely estimate target translation. Then, we capitalize on the discriminative power of correlation filters to precisely localize the tracked object.
arXiv Detail & Related papers (2020-12-23T16:43:21Z)
DS-Net: Dynamic Spatiotemporal Network for Video Salient Object Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information. We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
TRAT: Tracking by Attention Using Spatio-Temporal Features [14.520067060603209]
We propose a two-stream deep neural network tracker that uses both spatial and temporal features. Our architecture is developed over ATOM tracker and contains two backbones: (i) 2D-CNN network to capture appearance features and (ii) 3D-CNN network to capture motion features.
arXiv Detail & Related papers (2020-11-18T20:11:12Z)
Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted Multicuts [11.72025865314187]
We present an unsupervised multiple object tracking approach based on minimum visual features and lifted multicuts. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.
arXiv Detail & Related papers (2020-02-04T09:42:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.