AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding
- URL: http://arxiv.org/abs/2504.09583v1
- Date: Sun, 13 Apr 2025 14:06:50 GMT
- Title: AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding
- Authors: Fei Lin, Yonglin Tian, Tengchao Zhang, Jun Huang, Sangtian Guan, Fei-Yue Wang,
- Abstract summary: AirVista-II is an end-to-end agentic system for embodied UAVs.<n>System integrates agent-based task identification and scheduling, multimodal perception mechanisms, and differentiated extraction strategies.
- Score: 16.405658563770757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly important in dynamic environments such as logistics transportation and disaster response. However, current tasks often rely on human operators to monitor aerial videos and make operational decisions. This mode of human-machine collaboration suffers from significant limitations in efficiency and adaptability. In this paper, we present AirVista-II -- an end-to-end agentic system for embodied UAVs, designed to enable general-purpose semantic understanding and reasoning in dynamic scenes. The system integrates agent-based task identification and scheduling, multimodal perception mechanisms, and differentiated keyframe extraction strategies tailored for various temporal scenarios, enabling the efficient capture of critical scene information. Experimental results demonstrate that the proposed system achieves high-quality semantic understanding across diverse UAV-based dynamic scenarios under a zero-shot setting.
Related papers
- CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios [17.511081563758875]
This paper proposes coordination field agentic system for coordinating heterogeneous UAV swarms in complex urban scenarios.
A Coordination field mechanism is proposed to guide UAV motion and task selection, enabling decentralized and adaptive allocation of emergent tasks.
Experimental results demonstrate that the proposed system achieves superior performance in terms of task coverage, response time, and adaptability to dynamic changes.
arXiv Detail & Related papers (2025-04-30T18:02:45Z) - More Clear, More Flexible, More Precise: A Comprehensive Oriented Object Detection benchmark for UAV [58.89234732689013]
CODrone is a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions.
It also serves as a new benchmark designed to align with downstream task requirements.
We conduct a series of experiments based on 22 classical or SOTA methods to rigorously evaluate CODrone.
arXiv Detail & Related papers (2025-04-28T17:56:02Z) - REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution.<n>ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z) - CSAOT: Cooperative Multi-Agent System for Active Object Tracking [1.384468678066823]
Active Object Tracking (AOT) requires a controller agent to actively adjust its viewpoint to maintain visual contact with a moving target in complex environments.<n>Existing AOT solutions are predominantly single-agent-based, which struggle in dynamic and complex scenarios.<n>We introduce the Collaborative System for Active Object Tracking (CSAOT) to enable multiple agents to operate on a single device.
arXiv Detail & Related papers (2025-01-23T10:44:35Z) - Task Delay and Energy Consumption Minimization for Low-altitude MEC via Evolutionary Multi-objective Deep Reinforcement Learning [52.64813150003228]
The low-altitude economy (LAE), driven by unmanned aerial vehicles (UAVs) and other aircraft, has revolutionized fields such as transportation, agriculture, and environmental monitoring.<n>In the upcoming six-generation (6G) era, UAV-assisted mobile edge computing (MEC) is particularly crucial in challenging environments such as mountainous or disaster-stricken areas.<n>The task offloading problem is one of the key issues in UAV-assisted MEC, primarily addressing the trade-off between minimizing the task delay and the energy consumption of the UAV.
arXiv Detail & Related papers (2025-01-11T02:32:42Z) - UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility [33.73170899086857]
Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains.<n>This paper explores the integration of large language models (LLMs) and UAVs.<n>It categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge.
arXiv Detail & Related papers (2025-01-04T17:32:12Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.<n>We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use [74.39058448757645]
We present VipAct, an agent framework that enhances vision-language models (VLMs)
VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks.
We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements.
arXiv Detail & Related papers (2024-10-21T18:10:26Z) - Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework [80.39138462246034]
We propose the cooperative cognitive dynamic system (CCDS) to optimize the management for UAV swarms.
CCDS is a hierarchical and cooperative control structure that enables real-time data processing and decision.
In addition, CCDS can be integrated with the biomimetic mechanism to efficiently allocate tasks for UAV swarms.
arXiv Detail & Related papers (2024-05-18T12:45:00Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking.
The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation.
A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z) - Advanced Algorithms of Collision Free Navigation and Flocking for
Autonomous UAVs [0.0]
This report contributes towards the state-of-the-art in UAV control for safe autonomous navigation and motion coordination of multi-UAV systems.
The first part of this report deals with single-UAV systems. The complex problem of three-dimensional (3D) collision-free navigation in unknown/dynamic environments is addressed.
The second part of this report addresses safe navigation for multi-UAV systems. Distributed motion coordination methods of multi-UAV systems for flocking and 3D area coverage are developed.
arXiv Detail & Related papers (2021-10-30T03:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.