Autonomic Microservice Management via Agentic AI and MAPE-K Integration
- URL: http://arxiv.org/abs/2506.22185v1
- Date: Fri, 27 Jun 2025 12:46:12 GMT
- Title: Autonomic Microservice Management via Agentic AI and MAPE-K Integration
- Authors: Matteo Esposito, Alexander Bakhtin, Noman Ahmad, Mikel Robredo, Ruoyu Su, Valentina Lenarduzzi, Davide Taibi,
- Abstract summary: We propose a framework based on MAPE-K, which leverages agentic AI, for autonomous anomaly detection and remediation.<n>Our framework offers practical, industry-ready solutions for maintaining robust and secure system stability.
- Score: 41.95762653212291
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While microservices are revolutionizing cloud computing by offering unparalleled scalability and independent deployment, their decentralized nature poses significant security and management challenges that can threaten system stability. We propose a framework based on MAPE-K, which leverages agentic AI, for autonomous anomaly detection and remediation to address the daunting task of highly distributed system management. Our framework offers practical, industry-ready solutions for maintaining robust and secure microservices. Practitioners and researchers can customize the framework to enhance system stability, reduce downtime, and monitor broader system quality attributes such as system performance level, resilience, security, and anomaly management, among others.
Related papers
- Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems [1.9751175705897066]
Large Language Models (LLMs) are increasingly deployed within agentic systems-collections of interacting, LLM-powered agents that execute complex, adaptive using memory, tools, and dynamic planning.<n>Traditional software observability and operations practices fall short in addressing these challenges.<n>This paper introduces AgentOps: a comprehensive framework for observing, analyzing, optimizing, and automating operation of agentic AI systems.
arXiv Detail & Related papers (2025-07-15T12:54:43Z) - DURA-CPS: A Multi-Role Orchestrator for Dependability Assurance in LLM-Enabled Cyber-Physical Systems [2.118898809872991]
Cyber-Physical Systems (CPS) increasingly depend on advanced AI techniques to operate in critical applications.<n>Traditional verification and validation methods often struggle to handle the unpredictable and dynamic nature of AI components.<n>We introduce DURA-CPS, a novel framework that employs multi-role orchestration to automate the iterative assurance process for AI-powered CPS.
arXiv Detail & Related papers (2025-06-04T21:04:21Z) - MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR [59.83547898874152]
We introduce a sample-efficient, two-stage adaptation approach that integrates self-supervised learning with semi-supervised techniques.<n>MSDA is designed to enhance the robustness and generalization of ASR models.<n>We demonstrate that Meta PL can be applied effectively to ASR tasks, achieving state-of-the-art results.
arXiv Detail & Related papers (2025-05-30T14:46:05Z) - Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z) - Multi-Agent Architecture in Distributed Environment Control Systems: vision, challenges, and opportunities [50.38638300332429]
We propose a multi-agent architecture for distributed control of air-cooled chiller systems in data centers.<n>Our vision employs autonomous agents to monitor and regulate local operational parameters and optimize system-wide efficiency.
arXiv Detail & Related papers (2025-02-21T18:41:03Z) - Secure Resource Allocation via Constrained Deep Reinforcement Learning [49.15061461220109]
We present SARMTO, a framework that balances resource allocation, task offloading, security, and performance.<n>SARMTO consistently outperforms five baseline approaches, achieving up to a 40% reduction in system costs.<n>These enhancements highlight SARMTO's potential to revolutionize resource management in intricate distributed computing environments.
arXiv Detail & Related papers (2025-01-20T15:52:43Z) - Sustainable and Intelligent Public Facility Failure Management System Based on Large Language Models [14.776153063614244]
This paper presents a new Large Language Model (LLM)-based Smart Device Management framework.<n>We demonstrate its practical applicability and its capacity to significantly reduce budgetary constraints on public facilities.<n>We plan to extend the framework's scope to include a wider array of public facilities and to integrate it with cutting-edge cybersecurity technologies.
arXiv Detail & Related papers (2025-01-08T02:30:37Z) - An Integrated Artificial Intelligence Operating System for Advanced Low-Altitude Aviation Applications [4.62967829580797]
This paper introduces a high-performance artificial intelligence operating system tailored for low-altitude aviation.<n>It addresses key challenges such as real-time task execution, computational efficiency, and seamless modular collaboration.
arXiv Detail & Related papers (2024-11-28T01:24:16Z) - Microservices-based Software Systems Reengineering: State-of-the-Art and Future Directions [17.094721366340735]
Designing software compatible with cloud-based Microservice Architectures (MSAs) is vital due to the performance, scalability, and availability limitations.
We provide a comprehensive survey of current research into ways of identifying services in systems that can be redeployed as Static, dynamic, and hybrid approaches have been explored.
arXiv Detail & Related papers (2024-07-18T21:59:05Z) - Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework [80.39138462246034]
We propose the cooperative cognitive dynamic system (CCDS) to optimize the management for UAV swarms.
CCDS is a hierarchical and cooperative control structure that enables real-time data processing and decision.
In addition, CCDS can be integrated with the biomimetic mechanism to efficiently allocate tasks for UAV swarms.
arXiv Detail & Related papers (2024-05-18T12:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.