Related papers: Building AI Agents for Autonomous Clouds: Challenges and Design Principles

Building AI Agents for Autonomous Clouds: Challenges and Design Principles

URL: http://arxiv.org/abs/2407.12165v2
Date: Wed, 31 Jul 2024 06:01:15 GMT
Title: Building AI Agents for Autonomous Clouds: Challenges and Design Principles
Authors: Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan,
Abstract summary: AI for IT Operations (AIOps) aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions. We propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults.
Score: 17.03870042416836
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps) which aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds through AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions that satisfy them. We also propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults. We report promising results and lay the groundwork to build a modular and robust framework for building, evaluating, and improving agents for autonomous clouds.

Related papers

AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents [15.802600809497097]
This paper introduces AI2Agent, an end-to-end framework that automates AI project deployment through guideline-driven execution. We conducted experiments on 30 AI deployment cases, covering TTS, text-to-image generation, image editing, and other AI applications. Results show that AI2Agent significantly reduces deployment time and improves success rates.
arXiv Detail & Related papers (2025-03-31T10:58:34Z)
Towards Agentic AI Networking in 6G: A Generative Foundation Model-as-Agent Approach [35.05793485239977]
We propose AgentNet, a novel framework for supporting interaction, collaborative learning, and knowledge transfer among AI agents. We consider two application scenarios, digital-twin-based industrial automation and metaverse-based infotainment system, to describe how to apply AgentNet.
arXiv Detail & Related papers (2025-03-20T00:48:44Z)
Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users. Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources. We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z)
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds [12.464941027105306]
AI for IT Operations (AIOps) aims to automate complex operational tasks, such as fault localization and root cause analysis, to reduce human workload and minimize customer impact. Recent advances in Large Language Models (LLMs) and AI agents are revolutionizing AIOps by enabling end-to-end and multitask automation. We present AIOPSLAB, a framework that deploys microservice cloud environments, injects faults, generates workloads, and exports telemetry data but also orchestrates these components and provides interfaces for interacting with and evaluating agents.
arXiv Detail & Related papers (2025-01-12T04:17:39Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment. We find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
Cloud Platforms for Developing Generative AI Solutions: A Scoping Review of Tools and Services [0.27649989102029926]
Generative AI is transforming enterprise application development by enabling machines to create content, code, and designs. Cloud computing addresses these needs by offering infrastructure to train, deploy, and scale generative AI models. This review examines cloud services for generative AI, focusing on key providers like Amazon Web Services (AWS), Microsoft Azure, Google Cloud, IBM Cloud, Oracle Cloud, and Alibaba Cloud.
arXiv Detail & Related papers (2024-12-08T19:49:07Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z)
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey [0.0]
This paper examines the recent advancements in AI agent implementations. It focuses on their ability to achieve complex goals that require enhanced reasoning, planning, and tool execution capabilities.
arXiv Detail & Related papers (2024-04-17T17:32:41Z)
CACA Agent: Capability Collaboration based AI Agent [18.84686313298908]
We propose CACA Agent (Capability Collaboration based AI Agent) using an open architecture inspired by service computing. CACA Agent integrates a set of collaborative capabilities to implement AI Agents, not only reducing the dependence on a single LLM. We present a demo to illustrate the operation and the application scenario extension of CACA Agent.
arXiv Detail & Related papers (2024-03-22T11:42:47Z)
AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency [0.0]
In recent years, the integration of artificial intelligence (AI) and cloud computing has emerged as a promising avenue for addressing the growing computational demands of AI applications. This paper presents a comprehensive study of scalable, distributed AI frameworks leveraging cloud computing for enhanced deep learning performance and efficiency.
arXiv Detail & Related papers (2023-04-26T15:38:00Z)
AI for IT Operations (AIOps) on Cloud Platforms: Reviews, Opportunities and Challenges [60.56413461109281]
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big data generated by IT Operations processes. We discuss in depth the key types of data emitted by IT Operations activities, the scale and challenges in analyzing them, and where they can be helpful. We categorize the key AIOps tasks as - incident detection, failure prediction, root cause analysis and automated actions.
arXiv Detail & Related papers (2023-04-10T15:38:12Z)
Performance, Opaqueness, Consequences, and Assumptions: Simple questions for responsible planning of machine learning solutions [5.802346990263708]
We propose a quick and simple framework to support planning of AI solutions. The POCA framework is based on four pillars: Performance, Opaqueness, Consequences and Assumptions.
arXiv Detail & Related papers (2022-08-21T21:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.