VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation
- URL: http://arxiv.org/abs/2505.23267v1
- Date: Thu, 29 May 2025 09:15:44 GMT
- Title: VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation
- Authors: Jianlin Ye, Savvas Papaioannou, Panayiotis Kolios,
- Abstract summary: We propose Vision Language Model RRT (VLM-RRT), a hybrid approach that integrates the pattern recognition capabilities of Vision Language Models (VLMs) with the path-planning strengths of Rapidly-exploring Random Trees (RRT)<n>Our method biases sampling toward regions more likely to contain feasible paths, significantly improving sampling efficiency and path quality.
- Score: 4.022717732460524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Path planning is a fundamental capability of autonomous Unmanned Aerial Vehicles (UAVs), enabling them to efficiently navigate toward a target region or explore complex environments while avoiding obstacles. Traditional pathplanning methods, such as Rapidly-exploring Random Trees (RRT), have proven effective but often encounter significant challenges. These include high search space complexity, suboptimal path quality, and slow convergence, issues that are particularly problematic in high-stakes applications like disaster response, where rapid and efficient planning is critical. To address these limitations and enhance path-planning efficiency, we propose Vision Language Model RRT (VLM-RRT), a hybrid approach that integrates the pattern recognition capabilities of Vision Language Models (VLMs) with the path-planning strengths of RRT. By leveraging VLMs to provide initial directional guidance based on environmental snapshots, our method biases sampling toward regions more likely to contain feasible paths, significantly improving sampling efficiency and path quality. Extensive quantitative and qualitative experiments with various state-of-the-art VLMs demonstrate the effectiveness of this proposed approach.
Related papers
- dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning [69.36145467833498]
We introduce dVLM-AD, a diffusion-based vision-language model that unifies perception, structured reasoning, and low-level planning for end-to-end driving.<n> evaluated on nuScenes and WOD-E2E, dVLM-AD yields more consistent reasoning-action pairs and achieves planning performance comparable to existing driving VLM/VLA systems.
arXiv Detail & Related papers (2025-12-04T05:05:41Z) - AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios [64.51320327698231]
We introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios.<n>We develop an innovative semi-automated collaborative agent-based labeling assistant framework.<n>We also propose HawkEyeTrack, a novel method that collaboratively enhances vision-language representation learning.
arXiv Detail & Related papers (2025-11-26T04:44:27Z) - Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization [61.55616421408666]
Low-Altitude Economy Networks (LAENets) have enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection.<n> onboard vision (VLMs) offer inference for real-time inference but limited onboard dynamic network conditions.<n>We propose a UAV-enabled LAENet system that improves communication efficiency under dynamic LAENet conditions.
arXiv Detail & Related papers (2025-10-11T05:11:21Z) - Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z) - Safe and Economical UAV Trajectory Planning in Low-Altitude Airspace: A Hybrid DRL-LLM Approach with Compliance Awareness [3.9471658054053806]
We propose a novel UAV trajectory planning framework that combines deep reinforcement learning (DRL) with large language model (LLM) reasoning to enable safe, compliant, and economically viable path planning.<n> Experimental results demonstrate that our method significantly outperforms existing baselines across multiple metrics, including data collection rate, collision avoidance, successful landing, regulatory compliance, and energy efficiency.
arXiv Detail & Related papers (2025-06-10T07:51:29Z) - DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving [15.776506097490252]
We propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM)<n>Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios.
arXiv Detail & Related papers (2025-05-26T00:49:35Z) - Application of YOLOv8 in monocular downward multiple Car Target detection [0.0]
This paper presents an improved autonomous target detection network based on YOLOv8.<n>The proposed approach achieves highly efficient and precise detection of multi-scale, small, and remote objects.<n> Experimental results demonstrate that the enhanced model can effectively detect both large and small objects with a detection accuracy of 65%.
arXiv Detail & Related papers (2025-05-15T06:58:45Z) - Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models [62.12822290276912]
Auto-RT is a reinforcement learning framework that automatically explores and optimize complex attack strategies.<n>By significantly improving exploration efficiency and automatically optimizing attack strategies, Auto-RT detects a boarder range of vulnerabilities, achieving a faster detection speed and 16.63% higher success rates compared to existing methods.
arXiv Detail & Related papers (2025-01-03T14:30:14Z) - SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins [78.53885607559958]
We propose SCoTT, a wireless-aware path planning framework.<n>We show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories.<n>We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations.
arXiv Detail & Related papers (2024-11-27T10:45:49Z) - LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning [91.95362946266577]
Path planning is a fundamental scientific problem in robotics and autonomous navigation.<n>Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows.<n>We propose a new LLM based route planning method that synergistically combines the precise pathfinding capabilities of A* with the global reasoning capability of LLMs.<n>This hybrid approach aims to enhance pathfinding efficiency in terms of time and space complexity while maintaining the integrity of path validity, especially in large-scale scenarios.
arXiv Detail & Related papers (2024-06-20T01:24:30Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Vertex-based Networks to Accelerate Path Planning Algorithms [3.684936338492373]
We propose the utilization of vertices-based networks to enhance the sampling process of RRT*, leading to more efficient path planning.
We employ focal loss to address the associated data imbalance issue, and explore different masking configurations to determine practical tradeoffs in system performance.
arXiv Detail & Related papers (2023-07-13T20:56:46Z) - Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings.
Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z) - Trajectory Planning for Autonomous Vehicles Using Hierarchical
Reinforcement Learning [21.500697097095408]
Planning safe trajectories under uncertain and dynamic conditions makes the autonomous driving problem significantly complex.
Current sampling-based methods such as Rapidly Exploring Random Trees (RRTs) are not ideal for this problem because of the high computational cost.
We propose a Hierarchical Reinforcement Learning structure combined with a Proportional-Integral-Derivative (PID) controller for trajectory planning.
arXiv Detail & Related papers (2020-11-09T20:49:54Z) - Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep
Reinforcement Learning Approach [88.45509934702913]
We design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed.
We incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS.
By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time.
arXiv Detail & Related papers (2020-02-21T07:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.