Related papers: Open-World Drone Active Tracking with Goal-Centered Rewards

Open-World Drone Active Tracking with Goal-Centered Rewards

URL: http://arxiv.org/abs/2412.00744v2
Date: Wed, 22 Oct 2025 07:43:03 GMT
Title: Open-World Drone Active Tracking with Goal-Centered Rewards
Authors: Haowei Sun, Jinwu Hu, Zhirui Zhang, Haoyuan Tian, Xinze Xie, Yufeng Wang, Xiaohua Xie, Yun Lin, Zhuliang Yu, Mingkui Tan,
Abstract summary: Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose DAT, the first open-world drone active air-to-ground tracking benchmark.<n>We also propose GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios.
Score: 62.21394499788672
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate Drone Visual Active Tracking using reinforcement learning remains challenging due to the absence of a unified benchmark and the complexity of open-world environments with frequent interference. To address these issues, we pioneer a systematic solution. First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. Additionally, we propose a novel reinforcement learning method called GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios. Specifically, we design a Goal-Centered Reward to provide precise feedback across viewpoints to the agent, enabling it to expand perception and movement range through unrestricted perspectives. Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. Besides, experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.

Related papers

AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios [64.51320327698231]
We introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios.<n>We develop an innovative semi-automated collaborative agent-based labeling assistant framework.<n>We also propose HawkEyeTrack, a novel method that collaboratively enhances vision-language representation learning.
arXiv Detail & Related papers (2025-11-26T04:44:27Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
TrackVLA: Embodied Visual Tracking in the Wild [34.03604806748204]
Embodied visual tracking is a fundamental skill in Embodied AI, enabling an agent to follow a specific target in dynamic environments using only egocentric vision.<n>Existing approaches typically address this challenge through a modular separation of recognition and planning.<n>We propose TrackVLA, a Vision-Language-Action model that learns the synergy between object recognition and trajectory planning.
arXiv Detail & Related papers (2025-05-29T07:28:09Z)
A Simple Detector with Frame Dynamics is a Strong Tracker [43.912410355089634]
Infrared object tracking plays a crucial role in Anti-Unmanned Aerial Vehicle (Anti-UAV) applications.<n>Existing trackers often depend on cropped template regions and have limited motion modeling capabilities.<n>We propose a simple yet effective infrared tiny-object tracker that enhances tracking performance by integrating global detection and motion-aware learning.
arXiv Detail & Related papers (2025-05-08T03:16:03Z)
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking [11.146155422858824]
Single-stream architectures using Vision Transformer (ViT) backbones show great potential for real-time UAV tracking.<n>We propose to learn Occlusion-Robust Representations (ORR) based on ViTs for UAV tracking.<n>We also propose an Adaptive Feature-Based Knowledge Distillation (AFKD) method to create a more compact tracker.
arXiv Detail & Related papers (2025-04-12T14:06:50Z)
Leveraging Event Streams with Deep Reinforcement Learning for End-to-End UAV Tracking [1.8297494098768172]
We present our proposed approach for active tracking to increase the autonomy of Unmanned Aerial Vehicles (UAVs) using event cameras, low-energy imaging sensors. The proposed tracking controller is designed to respond to visual feedback from the mounted event sensor, adjusting the drone movements to follow the target.
arXiv Detail & Related papers (2024-10-03T07:56:40Z)
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking [14.382072224997074]
Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness. We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking. We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
arXiv Detail & Related papers (2024-07-07T14:10:04Z)
Track Anything Rapter(TAR) [0.0]
Track Anything Rapter (TAR) is designed to detect, segment, and track objects of interest based on user-provided multimodal queries. TAR utilizes cutting-edge pre-trained models like DINO, CLIP, and SAM to estimate the relative pose of the queried object. We showcase how the integration of these foundational models with a custom high-level control algorithm results in a highly stable and precise tracking system.
arXiv Detail & Related papers (2024-05-19T19:51:41Z)
MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z)
An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z)
Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking [84.38335117043907]
We propose a purely passive method to track a person walking in an invisible room by only observing a relay wall. To excavate imperceptible changes in videos of the relay wall, we introduce difference frames as an essential carrier of temporal-local motion messages. To evaluate the proposed method, we build and publish the first dynamic passive NLOS tracking dataset, NLOS-Track.
arXiv Detail & Related papers (2023-03-21T12:18:57Z)
SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking. The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation. A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z)
Space Non-cooperative Object Active Tracking with Deep Reinforcement Learning [1.212848031108815]
We propose an end-to-end active visual tracking method based on DQN algorithm, named as DRLAVT. It can guide the chasing spacecraft approach to arbitrary space non-cooperative target merely relied on color or RGBD images. It significantly outperforms position-based visual servoing baseline algorithm that adopts state-of-the-art 2D monocular tracker, SiamRPN.
arXiv Detail & Related papers (2021-12-18T06:12:24Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)
Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual Observations [75.60524561611008]
This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments. We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data. We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
arXiv Detail & Related papers (2021-09-28T20:25:10Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Dynamic Attention guided Multi-Trajectory Analysis for Single Object Tracking [62.13213518417047]
We propose to introduce more dynamics by devising a dynamic attention-guided multi-trajectory tracking strategy. In particular, we construct dynamic appearance model that contains multiple target templates, each of which provides its own attention for locating the target in the new frame. After spanning the whole sequence, we introduce a multi-trajectory selection network to find the best trajectory that delivers improved tracking performance.
arXiv Detail & Related papers (2021-03-30T05:36:31Z)
Decentralized Reinforcement Learning for Multi-Target Search and Detection by a Team of Drones [12.055303570215335]
Targets search and detection encompasses a variety of decision problems such as coverage, surveillance, search, observing and pursuit-evasion. We develop a multi-agent deep reinforcement learning (MADRL) method to coordinate a group of aerial vehicles (drones) for the purpose of locating a set of static targets in an unknown area.
arXiv Detail & Related papers (2021-03-17T09:04:47Z)
Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking. Our TS-RCN can be integrated with existing deep learning based visual trackers. To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.