VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
- URL: http://arxiv.org/abs/2512.15258v2
- Date: Fri, 19 Dec 2025 11:22:57 GMT
- Title: VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
- Authors: Yuze Wu, Mo Zhu, Xingxing Li, Yuheng Du, Yuxin Fan, Wenjun Li, Zhichao Han, Xin Zhou, Fei Gao,
- Abstract summary: VLA-AN is a framework dedicated to autonomous drone navigation in complex environments.<n>It addresses four major limitations of existing large aerial navigation models.<n>It achieves a maximum single-task success rate of 98.1%.
- Score: 12.689250855332569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.
Related papers
- HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving [20.266736153749417]
Vision-Language-Action (VLA) models offer promising capabilities for autonomous driving through multimodal understanding.<n>Their utilization in safety-critical scenarios is constrained by inherent limitations, including numerical reasoning, weak 3D spatial awareness, and high sensitivity to context.<n>We propose HiST-VLA, a novel Hierarchical Spatio-Temporal VLA model designed for reliable trajectory generation.
arXiv Detail & Related papers (2026-02-11T07:08:33Z) - DRL-Enabled Trajectory Planing for UAV-Assisted VLC: Optimal Altitude and Reward Design [35.154994099093244]
Integration of aerial vehicle (UAV) and visible light communication (VLC) technologies has emerged as a promising solution to offer efficient lighting.<n>This letter investigates the three-dimensional trajectory planning in a UAV-assisted VLC system.
arXiv Detail & Related papers (2026-01-30T03:44:14Z) - Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space [48.19308247102762]
We propose ANWM, an aerial navigation world model that predicts future visual observations conditioned on past frames and actions.<n> ANWM is trained on 4-DoF UAV trajectories and introduces a physics-inspired module: Future Frame Projection.<n> Empirical results demonstrate that ANWM significantly outperforms existing world models in long-distance visual forecasting and improves UAV navigation success rates in large-scale environments.
arXiv Detail & Related papers (2025-12-26T06:22:39Z) - Trajectory Design for UAV-Based Low-Altitude Wireless Networks in Unknown Environments: A Digital Twin-Assisted TD3 Approach [62.11847362756054]
Unmanned aerial vehicles (UAVs) are emerging as key enablers for low-altitude wireless network (LAWN)<n>We propose a digital twin (DT)-assisted training and deployment framework.<n>In this framework, the UAV transmits integrated sensing and communication signals to provide communication services to ground users, while simultaneously collecting echoes that are uploaded to the DT server to progressively construct virtual environments (VEs)<n>These VEs accelerate model training and are continuously updated with real-time UAV sensing data during deployment, supporting decision-making and enhancing flight safety.
arXiv Detail & Related papers (2025-10-28T10:05:53Z) - DAgger Diffusion Navigation: DAgger Boosted Diffusion Policy for Vision-Language Navigation [73.80968452950854]
Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural language instructions through free-form 3D spaces.<n>Existing VLN-CE approaches typically use a two-stage waypoint planning framework.<n>We propose DAgger Diffusion Navigation (DifNav) as an end-to-end optimized VLN-CE policy.
arXiv Detail & Related papers (2025-08-13T02:51:43Z) - NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z) - Task Assignment and Exploration Optimization for Low Altitude UAV Rescue via Generative AI Enhanced Multi-agent Reinforcement Learning [44.02103029265148]
This paper proposes a cooperation framework involving UAVs, GERs, and airships.<n>The framework enables resource pooling through UAV-to-GER (U2G) and UAV-to-airship (U2A) links, offering computing services for offloaded tasks.
arXiv Detail & Related papers (2025-04-18T08:44:06Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - RAPID: Robust and Agile Planner Using Inverse Reinforcement Learning for Vision-Based Drone Navigation [9.25068777307471]
This paper introduces a learning-based visual planner for agile drone flight in cluttered environments.<n>The proposed planner generates collision-free waypoints in milliseconds, enabling drones to perform agile maneuvers in complex environments without building separate perception, mapping, and planning modules.
arXiv Detail & Related papers (2025-02-04T06:42:08Z) - Monocular Obstacle Avoidance Based on Inverse PPO for Fixed-wing UAVs [29.207513994002202]
Fixed-wing Unmanned Aerial Vehicles (UAVs) are one of the most commonly used platforms for the Low-altitude Economy (LAE) and Urban Air Mobility (UAM)<n>Classical obstacle avoidance systems, which rely on prior maps or sophisticated sensors, face limitations in unknown low-altitude environments and small UAV platforms.<n>This paper proposes a lightweight deep reinforcement learning (DRL) based UAV collision avoidance system.
arXiv Detail & Related papers (2024-11-27T03:03:37Z) - Navigation in a simplified Urban Flow through Deep Reinforcement Learning [0.9217021281095907]
Unmanned aerial vehicles (UAVs) in urban environments require a strategy to minimize their environmental impact.
Our goal is to develop DRL algorithms capable of enabling the autonomous navigation of UAVs in urban environments.
arXiv Detail & Related papers (2024-09-26T15:05:15Z) - Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep
Reinforcement Learning Approach [88.45509934702913]
We design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed.
We incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS.
By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time.
arXiv Detail & Related papers (2020-02-21T07:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.