AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation
- URL: http://arxiv.org/abs/2508.15232v1
- Date: Thu, 21 Aug 2025 04:43:35 GMT
- Title: AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation
- Authors: Ruipu Wu, Yige Zhang, Jinyu Chen, Linjiang Huang, Shifeng Zhang, Xu Zhou, Liang Wang, Si Liu,
- Abstract summary: Aerial Vision-and-Language Navigation (VLN) is an emerging task that enables Unmanned Aerial Vehicles (UAVs) to navigate outdoor environments using natural language instructions and visual cues.<n>We introduce a novel task called Dual-Altitude UAV Collaborative VLN (DuAl-VLN)<n>In this task, two UAVs operate at distinct altitudes: a high-altitude UAV responsible for broad environmental reasoning, and a low-altitude UAV tasked with precise navigation.
- Score: 34.63571674882289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aerial Vision-and-Language Navigation (VLN) is an emerging task that enables Unmanned Aerial Vehicles (UAVs) to navigate outdoor environments using natural language instructions and visual cues. However, due to the extended trajectories and complex maneuverability of UAVs, achieving reliable UAV-VLN performance is challenging and often requires human intervention or overly detailed instructions. To harness the advantages of UAVs' high mobility, which could provide multi-grained perspectives, while maintaining a manageable motion space for learning, we introduce a novel task called Dual-Altitude UAV Collaborative VLN (DuAl-VLN). In this task, two UAVs operate at distinct altitudes: a high-altitude UAV responsible for broad environmental reasoning, and a low-altitude UAV tasked with precise navigation. To support the training and evaluation of the DuAl-VLN, we construct the HaL-13k, a dataset comprising 13,838 collaborative high-low UAV demonstration trajectories, each paired with target-oriented language instructions. This dataset includes both unseen maps and an unseen object validation set to systematically evaluate the model's generalization capabilities across novel environments and unfamiliar targets. To consolidate their complementary strengths, we propose a dual-UAV collaborative VLN framework, AeroDuo, where the high-altitude UAV integrates a multimodal large language model (Pilot-LLM) for target reasoning, while the low-altitude UAV employs a lightweight multi-stage policy for navigation and target grounding. The two UAVs work collaboratively and only exchange minimal coordinate information to ensure efficiency.
Related papers
- How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline [74.4054700050366]
Unmanned Aerial Vehicles (UAVs) offer wide-ranging applications but also pose significant safety and privacy violation risks.<n>Current Anti-UAV research primarily focuses on RGB, infrared (IR), or RGB-IR videos captured by fixed ground cameras.<n>We propose a new multi-modal visual tracking task termed UAV-Anti-UAV, which involves a pursuer UAV tracking a target adversarial UAV in the video stream.
arXiv Detail & Related papers (2025-12-08T10:19:54Z) - AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios [64.51320327698231]
We introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios.<n>We develop an innovative semi-automated collaborative agent-based labeling assistant framework.<n>We also propose HawkEyeTrack, a novel method that collaboratively enhances vision-language representation learning.
arXiv Detail & Related papers (2025-11-26T04:44:27Z) - SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments [7.251041314934871]
Unmanned Aerial Vehicles (UAVs) have emerged as versatile tools across various sectors, driven by their mobility and adaptability.<n>This paper introduces SkyVLN, a novel framework integrating verbalize vision-and-language navigation (VLN) with Model Predictive Control (NMPC) to enhance UAV autonomy in complex urban environments.
arXiv Detail & Related papers (2025-07-09T05:38:32Z) - UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning [39.07541452390107]
Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction.<n>We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach.<n>We present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control.
arXiv Detail & Related papers (2025-05-21T16:31:28Z) - UAV-VLN: End-to-End Vision Language guided Navigation for UAVs [0.0]
A core challenge in AI-guided autonomy is enabling agents to navigate realistically and effectively in previously unseen environments.<n>We propose UAV-VLN, a novel end-to-end Vision-Language Navigation framework for Unmanned Aerial Vehicles (UAVs)<n>Our system interprets free-form natural language instructions, grounds them into visual observations, and plans feasible aerial trajectories in diverse environments.
arXiv Detail & Related papers (2025-04-30T08:40:47Z) - More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.<n>It also serves as a new benchmark designed to align with downstream task requirements.<n>We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z) - Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning [50.33447711072726]
This paper proposes a deep reinforcement learning-based model for decision-making in multi-role UAV cooperative pursuit-evasion game.
The proposed method enables autonomous decision-making of the UAVs in pursuit-evasion game scenarios.
arXiv Detail & Related papers (2024-11-05T10:45:30Z) - Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology [38.2096731046639]
Recent efforts in UAV vision-language navigation predominantly adopt ground-based VLN settings.
We propose solutions from three perspectives: platform, benchmark, and methodology.
arXiv Detail & Related papers (2024-10-09T17:29:01Z) - Evidential Detection and Tracking Collaboration: New Problem, Benchmark
and Algorithm for Robust Anti-UAV System [56.51247807483176]
Unmanned Aerial Vehicles (UAVs) have been widely used in many areas, including transportation, surveillance, and military.
Previous works have simplified such an anti-UAV task as a tracking problem, where prior information of UAVs is always provided.
In this paper, we first formulate a new and practical anti-UAV problem featuring the UAVs perception in complex scenes without prior UAVs information.
arXiv Detail & Related papers (2023-06-27T19:30:23Z) - A Multi-UAV System for Exploration and Target Finding in Cluttered and
GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles.
The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map.
Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z) - 3D UAV Trajectory and Data Collection Optimisation via Deep
Reinforcement Learning [75.78929539923749]
Unmanned aerial vehicles (UAVs) are now beginning to be deployed for enhancing the network performance and coverage in wireless communication.
It is challenging to obtain an optimal resource allocation scheme for the UAV-assisted Internet of Things (IoT)
In this paper, we design a new UAV-assisted IoT systems relying on the shortest flight path of the UAVs while maximising the amount of data collected from IoT devices.
arXiv Detail & Related papers (2021-06-06T14:08:41Z) - Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking [59.06167734555191]
Unmanned Aerial Vehicle (UAV) offers lots of applications in both commerce and recreation.
We consider the task of tracking UAVs, providing rich information such as location and trajectory.
We propose a dataset, Anti-UAV, with more than 300 video pairs containing over 580k manually annotated bounding boxes.
arXiv Detail & Related papers (2021-01-21T07:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.