Cloud Infrastructure Management in the Age of AI Agents
- URL: http://arxiv.org/abs/2506.12270v1
- Date: Fri, 13 Jun 2025 22:50:12 GMT
- Title: Cloud Infrastructure Management in the Age of AI Agents
- Authors: Zhenning Yang, Archit Bhatnagar, Yiming Qiu, Tongyuan Miao, Patrick Tser Jern Kon, Yunming Xiao, Yibo Huang, Martin Casado, Ang Chen,
- Abstract summary: We make a case for developing AI agents powered by large language models (LLMs) to automate cloud infrastructure management tasks.<n>In a preliminary study, we investigate the potential for AI agents to use different cloud/user interfaces.<n>We report takeaways on their effectiveness on different management tasks, and identify research challenges and potential solutions.
- Score: 8.243598669679354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cloud infrastructure is the cornerstone of the modern IT industry. However, managing this infrastructure effectively requires considerable manual effort from the DevOps engineering team. We make a case for developing AI agents powered by large language models (LLMs) to automate cloud infrastructure management tasks. In a preliminary study, we investigate the potential for AI agents to use different cloud/user interfaces such as software development kits (SDK), command line interfaces (CLI), Infrastructure-as-Code (IaC) platforms, and web portals. We report takeaways on their effectiveness on different management tasks, and identify research challenges and potential solutions.
Related papers
- OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z) - Building Scalable AI-Powered Applications with Cloud Databases: Architectures, Best Practices and Performance Considerations [0.0]
The rapid adoption of AI-powered applications demands high-performance, scalable, and efficient cloud database solutions.<n>This paper explores how cloud-native databases enable AI-driven applications by leveraging purpose-built technologies.<n>Performance benchmarks, scalability considerations, and cost-efficient strategies are evaluated to guide the design of AI-enabled applications.
arXiv Detail & Related papers (2025-04-26T04:17:46Z) - Towards Agentic AI Networking in 6G: A Generative Foundation Model-as-Agent Approach [35.05793485239977]
We propose AgentNet, a novel framework for supporting interaction, collaborative learning, and knowledge transfer among AI agents.<n>We consider two application scenarios, digital-twin-based industrial automation and metaverse-based infotainment system, to describe how to apply AgentNet.
arXiv Detail & Related papers (2025-03-20T00:48:44Z) - Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps [0.0]
We leveraged GenAI to develop a GenAI-based solution for autonomous CloudOps for the existing MontyCloud system.<n>We developed MOYA, a multi-agent framework that balances autonomy with the necessary human control.<n>This framework integrates various internal and external systems and is optimized for factors like task orchestration, security, and error mitigation.
arXiv Detail & Related papers (2025-01-14T16:30:10Z) - Cloud Platforms for Developing Generative AI Solutions: A Scoping Review of Tools and Services [0.27649989102029926]
Generative AI is transforming enterprise application development by enabling machines to create content, code, and designs.<n>Cloud computing addresses these needs by offering infrastructure to train, deploy, and scale generative AI models.<n>This review examines cloud services for generative AI, focusing on key providers like Amazon Web Services (AWS), Microsoft Azure, Google Cloud, IBM Cloud, Oracle Cloud, and Alibaba Cloud.
arXiv Detail & Related papers (2024-12-08T19:49:07Z) - Building AI Agents for Autonomous Clouds: Challenges and Design Principles [17.03870042416836]
AI for IT Operations (AIOps) aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact.
This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions.
We propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults.
arXiv Detail & Related papers (2024-07-16T20:40:43Z) - Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks.
SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs.
We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z) - AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities
and Challenges [60.56413461109281]
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes.
We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful.
We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions.
arXiv Detail & Related papers (2023-04-10T15:38:12Z) - Edge-Cloud Polarization and Collaboration: A Comprehensive Survey [61.05059817550049]
We conduct a systematic review for both cloud and edge AI.
We are the first to set up the collaborative learning mechanism for cloud and edge modeling.
We discuss potentials and practical experiences of some on-going advanced edge AI topics.
arXiv Detail & Related papers (2021-11-11T05:58:23Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.