Related papers: InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management

InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management

URL: http://arxiv.org/abs/2509.13704v1
Date: Wed, 17 Sep 2025 05:14:11 GMT
Title: InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management
Authors: Liangtao Lin, Zhaomeng Zhu, Tianwei Zhang, Yonggang Wen,
Abstract summary: InfraMind is a novel exploration-based GUI agentic framework specifically tailored for industrial management systems.<n>Our approach consistently outperforms existing frameworks in terms of task success rate and operational efficiency.
Score: 15.42553917257021
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mission-critical industrial infrastructure, such as data centers, increasingly depends on complex management software. Its operations, however, pose significant challenges due to the escalating system complexity, multi-vendor integration, and a shortage of expert operators. While Robotic Process Automation (RPA) offers partial automation through handcrafted scripts, it suffers from limited flexibility and high maintenance costs. Recent advances in Large Language Model (LLM)-based graphical user interface (GUI) agents have enabled more flexible automation, yet these general-purpose agents face five critical challenges when applied to industrial management, including unfamiliar element understanding, precision and efficiency, state localization, deployment constraints, and safety requirements. To address these issues, we propose InfraMind, a novel exploration-based GUI agentic framework specifically tailored for industrial management systems. InfraMind integrates five innovative modules to systematically resolve different challenges in industrial management: (1) systematic search-based exploration with virtual machine snapshots for autonomous understanding of complex GUIs; (2) memory-driven planning to ensure high-precision and efficient task execution; (3) advanced state identification for robust localization in hierarchical interfaces; (4) structured knowledge distillation for efficient deployment with lightweight models; and (5) comprehensive, multi-layered safety mechanisms to safeguard sensitive operations. Extensive experiments on both open-source and commercial DCIM platforms demonstrate that our approach consistently outperforms existing frameworks in terms of task success rate and operational efficiency, providing a rigorous and scalable solution for industrial management automation.

Related papers

Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents [63.03252293761656]
This paper systematically reviews the technologies, applications, and evaluation methods of industry agents based on large language models (LLMs)<n>We examine the three key technological pillars that support the advancement of agent capabilities: Memory, Planning, and Tool Use.<n>We provide an overview of the application of industry agents in real-world domains such as digital engineering, scientific discovery, embodied intelligence, collaborative business execution, and complex system simulation.
arXiv Detail & Related papers (2025-10-20T12:46:55Z)
Higher Satisfaction, Lower Cost: A Technical Report on How LLMs Revolutionize Meituan's Intelligent Interaction Systems [67.18731675163589]
We introduce WOWService, an intelligent interaction system tailored for industrial applications.<n>With the integration of LLMs and multi-agent architectures, WOWService enables autonomous task management and collaborative problem-solving.<n> WOWService is deployed on the Meituan App, achieving significant gains in key metrics.
arXiv Detail & Related papers (2025-10-15T08:35:51Z)
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning [83.81404871748438]
MagicGUI is a foundational mobile GUI agent designed to address critical challenges in perception, grounding, and reasoning within real-world mobile GUI environments.<n>The framework is underpinned by six key components, including a comprehensive and accurate dataset, enhanced perception and grounding capabilities, a comprehensive and unified action space, and planning-oriented reasoning mechanisms.
arXiv Detail & Related papers (2025-07-19T12:33:43Z)
Agent-based Condition Monitoring Assistance with Multimodal Industrial Database Retrieval Augmented Generation [3.8451399765175016]
Condition monitoring (CM) plays a crucial role in ensuring reliability and efficiency in the process industry.<n>This work integrates large language model (LLM)-based reasoning agents with CM to address analyst and industry needs.<n>We propose MindRAG, a modular framework combining multimodal retrieval-augmented generation (RAG) with novel vector store structures designed specifically for CM data.
arXiv Detail & Related papers (2025-06-10T21:04:18Z)
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance [7.110126223593506]
This paper envisions a future where AI agents autonomously manage tasks that previously required distinct expertise and manual coordination.<n>We introduce AssetOpsBench -- a unified framework and environment designed to guide the development, orchestration, and evaluation of domain-specific agents.<n>We outline the key requirements for such holistic systems and provide actionable insights into building agents that integrate perception, reasoning, and control for real-world industrial operations.
arXiv Detail & Related papers (2025-06-04T10:57:35Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [58.50944604905037]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework [49.633199780510864]
This work proposes a multi-agent autonomous mechatronics design framework, integrating expertise across mechanical design, optimization, electronics, and software engineering.<n> operating primarily through a language-driven workflow, the framework incorporates structured human feedback to ensure robust performance under real-world constraints.<n>A fully functional autonomous vessel was developed with optimized propulsion, cost-effective electronics, and advanced control.
arXiv Detail & Related papers (2025-04-20T16:57:45Z)
A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction.<n>Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms.<n>This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z)
Autonomous Deep Agent [0.7489814067742621]
Deep Agent is an advanced autonomous AI system designed to manage complex multi-phase tasks.<n>The system's foundation is built on our Hierarchical Task DAG framework.<n>Deep Agent establishes a novel paradigm in self-governing AI systems.
arXiv Detail & Related papers (2025-02-10T21:46:54Z)
An Integrated Artificial Intelligence Operating System for Advanced Low-Altitude Aviation Applications [4.62967829580797]
This paper introduces a high-performance artificial intelligence operating system tailored for low-altitude aviation.<n>It addresses key challenges such as real-time task execution, computational efficiency, and seamless modular collaboration.
arXiv Detail & Related papers (2024-11-28T01:24:16Z)
BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration [0.0]
We focus on designing a flexible agent engineering framework capable of handling complex use case applications across various domains. The proposed framework provides reliability in industrial applications and presents techniques to ensure a scalable, flexible, and collaborative workflow for multiple autonomous agents.
arXiv Detail & Related papers (2024-06-28T16:39:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.