Related papers: Chat with UAV -- Human-UAV Interaction Based on Large Language Models

Chat with UAV -- Human-UAV Interaction Based on Large Language Models

URL: http://arxiv.org/abs/2512.08145v1
Date: Tue, 09 Dec 2025 00:55:40 GMT
Title: Chat with UAV -- Human-UAV Interaction Based on Large Language Models
Authors: Haoran Wang, Zhuohang Chen, Guang Li, Bo Ma, Chuanghuang Li,
Abstract summary: The future of UAV interaction systems is evolving from engineer-driven to user-driven.<n>This paper proposes a novel dual-agent Human-UAV Interaction framework.
Score: 6.041434126017702
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human-UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual-agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.

Related papers

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting [66.90028121194636]
Current Vision-Language-Action (VLA) models are often constrained by a rigid, static interaction paradigm.<n>VITA-E is a novel embodied interaction framework designed for both behavioral and nearly real-time interruption.
arXiv Detail & Related papers (2025-10-21T17:59:56Z)
TACOS: Task Agnostic COordinator of a multi-drone System [41.99844472131922]
TACOS (Task-Agnostic COordinator of a multi-drone System) is a unified framework that enables high-level natural language control of multi-UAV systems.<n>It integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real-world.
arXiv Detail & Related papers (2025-10-02T10:21:35Z)
Generative Interfaces for Language Models [70.25765232527762]
We propose a paradigm in which large language models (LLMs) respond to user queries by proactively generating user interfaces (UIs)<n>Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs.<n>Results show that generative interfaces consistently outperform conversational ones, with up to a 72% improvement in human preference.
arXiv Detail & Related papers (2025-08-26T17:43:20Z)
AppAgent v2: Advanced Agent for Flexible Mobile Interactions [57.98933460388985]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.<n>Our agent constructs a flexible action space that enhances adaptability across various applications.<n>Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z)
MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments [82.67236400004826]
We introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions. MEM module enables MEIA to generate executable action plans based on diverse requirements and the robot's capabilities.
arXiv Detail & Related papers (2024-02-01T02:43:20Z)
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing [99.80742991922992]
The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction.
arXiv Detail & Related papers (2023-11-01T15:13:43Z)
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters [13.6682552098234]
Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks. We present ALTER, a system that effectively builds the multi-tAsk learners with mixTure-of-task-adaptERs upon small language models. A two-stage training method is proposed to optimize the collaboration between adapters at a small computational cost.
arXiv Detail & Related papers (2023-09-20T03:39:56Z)
Unified Human-Scene Interaction via Prompted Chain-of-Contacts [61.87652569413429]
Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands.
arXiv Detail & Related papers (2023-09-14T17:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.