LLM-Driven Self-Refinement for Embodied Drone Task Planning
- URL: http://arxiv.org/abs/2508.15501v1
- Date: Thu, 21 Aug 2025 12:29:01 GMT
- Title: LLM-Driven Self-Refinement for Embodied Drone Task Planning
- Authors: Deyu Zhang, Xicheng Zhang, Jiahao Li, Tingting Long, Xunhua Dai, Yongjian Fu, Jinrui Zhang, Ju Ren, Yaoxue Zhang,
- Abstract summary: SRDrone is a novel system designed for self-refinement task planning in industrial-grade embodied drones.<n>It incorporates a continuous state evaluation methodology to robustly and accurately determine task outcomes.<n>It also implements a hierarchical Behavior Tree (BT) modification model.
- Score: 29.164725771562473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SRDrone, a novel system designed for self-refinement task planning in industrial-grade embodied drones. SRDrone incorporates two key technical contributions: First, it employs a continuous state evaluation methodology to robustly and accurately determine task outcomes and provide explanatory feedback. This approach supersedes conventional reliance on single-frame final-state assessment for continuous, dynamic drone operations. Second, SRDrone implements a hierarchical Behavior Tree (BT) modification model. This model integrates multi-level BT plan analysis with a constrained strategy space to enable structured reflective learning from experience. Experimental results demonstrate that SRDrone achieves a 44.87% improvement in Success Rate (SR) over baseline methods. Furthermore, real-world deployment utilizing an experience base optimized through iterative self-refinement attains a 96.25% SR. By embedding adaptive task refinement capabilities within an industrial-grade BT planning framework, SRDrone effectively integrates the general reasoning intelligence of Large Language Models (LLMs) with the stringent physical execution constraints inherent to embodied drones. Code is available at https://github.com/ZXiiiC/SRDrone.
Related papers
- Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons [69.87766750714945]
General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations.<n>We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision.<n>Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints.
arXiv Detail & Related papers (2026-03-02T17:38:58Z) - ManeuverNet: A Soft Actor-Critic Framework for Precise Maneuvering of Double-Ackermann-Steering Robots with Optimized Reward Functions [0.7322887425853787]
ManeuverNet is a DRL framework tailored for double-Ackermann systems, combining Soft Actor-Critic with CrossQ.<n>We extensively evaluate ManeuverNet against both state-of-the-art DRL baselines and the Timed Elastic Band planner. Experimental results demonstrate that our framework substantially improves maneuverability and success rates.<n>In real-world trials, ManeuverNet achieved up to a 90% increase in maneuvering trajectory efficiency, highlighting its robustness and practical applicability.
arXiv Detail & Related papers (2026-02-16T13:19:04Z) - ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation [54.071574153853994]
ProRAG is a process-supervised reinforcement learning framework designed to integrate learned step-level supervision into the online optimization loop.<n>Our framework consists of four stages: (1) Supervised Policy Warmup to initialize the model with a structured reasoning format; (2) construction of an MCTS-based Process Reward Model (PRM) to quantify intermediate reasoning quality; (3) PRM-Guided Reasoning Refinement to align the policy with fine-grained process preferences; and (4) Process-Supervised Reinforcement Learning with a dual-granularity advantage mechanism.
arXiv Detail & Related papers (2026-01-29T16:04:59Z) - CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient Routing [25.48759875572515]
CASTER (Context-Aware Strategy for Task Efficient Routing) is a lightweight router for dynamic model selection in graph-based MAS.<n>CASTER reduces inference cost by up to 72.4% compared to strong-model baselines.
arXiv Detail & Related papers (2026-01-27T16:52:47Z) - Integrating Symbolic RL Planning into a BDI-based Autonomous UAV Framework: System Integration and SIL Validation [3.5966087153300057]
We propose an extended version of the Autonomous Mission Agents for Drones (AMAD) cognitive multi-agent architecture, enhanced with symbolic reinforcement learning for dynamic mission planning and execution.<n>We validated our framework in a Software-in-the-Loop (SIL) environment structured identically to an intended Hardware-In-the-Loop Simulation (HILS) platform.<n> Experimental results demonstrate stable integration and interoperability of modules, successful transitions between BDI-driven and symbolic RL-driven planning phases, and consistent mission performance.
arXiv Detail & Related papers (2025-08-16T03:27:26Z) - KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z) - RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation [59.9896841079005]
We introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation.<n>The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences.<n>Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations.
arXiv Detail & Related papers (2025-06-07T06:15:49Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Reinforcement Learning for Robust Missile Autopilot Design [0.0]
This work is pioneer in proposing Reinforcement Learning as a framework for flight control.
Under TRPO's methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance.
Results show that it is possible both to achieve the optimal performance and to improve the agent's robustness to uncertainties.
arXiv Detail & Related papers (2020-11-26T09:30:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.