Related papers: UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

URL: http://arxiv.org/abs/2505.15725v2
Date: Mon, 26 May 2025 11:15:18 GMT
Title: UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Authors: Xiangyu Wang, Donglin Yang, Yue Liao, Wenhao Zheng, wenjun wu, Bin Dai, Hongsheng Li, Si Liu,
Abstract summary: Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction.<n>We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach.<n>We present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control.
Score: 39.07541452390107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine-grained control policies by mimicking expert pilot trajectories paired with atomic language instructions. To support this paradigm, we present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control. It includes a task formulation, a large-scale dataset collected in diverse environments, a deployable control framework, and a simulation suite for systematic evaluation. Our design enables UAVs to closely imitate the precise, expert-level flight trajectories of human pilots and supports direct deployment without sim-to-real gap. We conduct extensive experiments on UAV-Flow, benchmarking VLN and VLA paradigms. Results show that VLA models are superior to VLN baselines and highlight the critical role of spatial grounding in the fine-grained Flow setting.

Related papers

LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks [57.27815890269697]
This work focuses on maximizing the secrecy rate in heterogeneous UAV networks (HetUAVNs) under energy constraints.<n>We introduce a Large Language Model (LLM)-guided multi-agent learning approach.<n>Results show that our method outperforms existing baselines in secrecy and energy efficiency.
arXiv Detail & Related papers (2025-07-23T04:22:57Z)
VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning [77.34267241692706]
Vision-Language Navigation (VLN) is a core challenge in embodied AI, requiring agents to navigate real-world environments using natural language instructions.<n>We propose VLN-R1, an end-to-end framework that leverages Large Vision-Language Models (LVLM) to directly translate egocentric video streams into continuous navigation actions.
arXiv Detail & Related papers (2025-06-20T17:59:59Z)
Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding [1.280979348722635]
Vision-and-language navigation (VLN) is a long-standing challenge in autonomous robotics, aiming to empower agents with the ability to follow human instructions while navigating complex environments.<n>We propose Vision-Language Fly (VLFly), a framework tailored for Unmanned Aerial Vehicles (UAVs) to execute language-guided flight.
arXiv Detail & Related papers (2025-06-12T14:40:50Z)
UAV-VLN: End-to-End Vision Language guided Navigation for UAVs [0.0]
A core challenge in AI-guided autonomy is enabling agents to navigate realistically and effectively in previously unseen environments.<n>We propose UAV-VLN, a novel end-to-end Vision-Language Navigation framework for Unmanned Aerial Vehicles (UAVs)<n>Our system interprets free-form natural language instructions, grounds them into visual observations, and plans feasible aerial trajectories in diverse environments.
arXiv Detail & Related papers (2025-04-30T08:40:47Z)
More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.<n>It also serves as a new benchmark designed to align with downstream task requirements.<n>We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z)
UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility [33.73170899086857]
Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains.<n>This paper explores the integration of large language models (LLMs) and UAVs.<n>It categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge.
arXiv Detail & Related papers (2025-01-04T17:32:12Z)
Integrating Large Language Models for UAV Control in Simulated Environments: A Modular Interaction Approach [0.3495246564946556]
This study explores the application of Large Language Models in UAV control. By enabling UAVs to interpret and respond to natural language commands, LLMs simplify the UAV control and usage. The paper discusses several key areas where LLMs can impact UAV technology, including autonomous decision-making, dynamic mission planning, enhanced situational awareness, and improved safety protocols.
arXiv Detail & Related papers (2024-10-23T06:56:53Z)
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology [38.2096731046639]
Recent efforts in UAV vision-language navigation predominantly adopt ground-based VLN settings. We propose solutions from three perspectives: platform, benchmark, and methodology.
arXiv Detail & Related papers (2024-10-09T17:29:01Z)
A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles. The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map. Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z)
3D UAV Trajectory and Data Collection Optimisation via Deep Reinforcement Learning [75.78929539923749]
Unmanned aerial vehicles (UAVs) are now beginning to be deployed for enhancing the network performance and coverage in wireless communication. It is challenging to obtain an optimal resource allocation scheme for the UAV-assisted Internet of Things (IoT) In this paper, we design a new UAV-assisted IoT systems relying on the shortest flight path of the UAVs while maximising the amount of data collected from IoT devices.
arXiv Detail & Related papers (2021-06-06T14:08:41Z)
Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking [59.06167734555191]
Unmanned Aerial Vehicle (UAV) offers lots of applications in both commerce and recreation. We consider the task of tracking UAVs, providing rich information such as location and trajectory. We propose a dataset, Anti-UAV, with more than 300 video pairs containing over 580k manually annotated bounding boxes.
arXiv Detail & Related papers (2021-01-21T07:00:15Z)
Federated Learning in the Sky: Joint Power Allocation and Scheduling with UAV Swarms [98.78553146823829]
Unmanned aerial vehicle (UAV) swarms must exploit machine learning (ML) in order to execute various tasks. In this paper, a novel framework is proposed to implement distributed learning (FL) algorithms within a UAV swarm.
arXiv Detail & Related papers (2020-02-19T14:04:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.