Related papers: SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs

SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs

URL: http://arxiv.org/abs/2501.09316v1
Date: Thu, 16 Jan 2025 06:14:58 GMT
Title: SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs
Authors: Anbang Ye, Qianran Ma, Jia Chen, Muqi Li, Tong Li, Fujiao Liu, Siqi Mai, Meichen Lu, Haitao Bao, Yang You,
Abstract summary: General-purpose AI agents struggle to efficiently utilize domain-specific knowledge and human expertise.<n>We introduce the Standard Operational Procedure-guided Agent ( SOP-agent), a novel framework for constructing domain-specific agents.<n> SOP-agent demonstrates excellent versatility, achieving performance superior to general-purpose agent frameworks.
Score: 9.117180930298813
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite significant advancements in general-purpose AI agents, several challenges still hinder their practical application in real-world scenarios. First, the limited planning capabilities of Large Language Models (LLM) restrict AI agents from effectively solving complex tasks that require long-horizon planning. Second, general-purpose AI agents struggle to efficiently utilize domain-specific knowledge and human expertise. In this paper, we introduce the Standard Operational Procedure-guided Agent (SOP-agent), a novel framework for constructing domain-specific agents through pseudocode-style Standard Operational Procedures (SOPs) written in natural language. Formally, we represent a SOP as a decision graph, which is traversed to guide the agent in completing tasks specified by the SOP. We conduct extensive experiments across tasks in multiple domains, including decision-making, search and reasoning, code generation, data cleaning, and grounded customer service. The SOP-agent demonstrates excellent versatility, achieving performance superior to general-purpose agent frameworks and comparable to domain-specific agent systems. Additionally, we introduce the Grounded Customer Service Benchmark, the first benchmark designed to evaluate the grounded decision-making capabilities of AI agents in customer service scenarios based on SOPs.

Related papers

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z)
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems [2.462408812529728]
This review presents a structured analysis of textbfTrust, Risk, and Security Management (TRiSM) in the context of LLM-based Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance.
arXiv Detail & Related papers (2025-06-04T16:26:11Z)
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents [12.052972947563424]
Existing benchmarks predominantly evaluate agent performance based on task completion as a proxy for overall effectiveness.<n>We propose PIPA, a unified evaluation protocol that conceptualizes the behavioral process of interactive task planning agents.<n>Our analyses show that agents excel in different behavioral stages, with user satisfaction shaped by both outcomes and intermediate behaviors.
arXiv Detail & Related papers (2025-05-02T21:27:10Z)
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents [64.75036903373712]
Proposer-Agent-Evaluator is a learning system that enables foundation model agents to autonomously discover and practice skills in the wild.<n>At the heart of PAE is a context-aware task proposer that autonomously proposes tasks for the agent to practice with context information.<n>The success evaluation serves as the reward signal for the agent to refine its policies through RL.
arXiv Detail & Related papers (2024-12-17T18:59:50Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Agent S: An Open Agentic Framework that Uses Computers Like a Human [31.16046798529319]
We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI) Agent S aims to address three key challenges in automating computer tasks: acquiring domain-specific knowledge, planning over long task horizons, and handling dynamic, non-uniform interfaces.
arXiv Detail & Related papers (2024-10-10T17:43:51Z)
Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems. In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy. Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
We propose an agent that perceives its environment solely through screenshot images.<n>By leveraging the reasoning capability of the Large Language Models, we eliminate the need for large-scale human demonstration data.<n>Agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop.
arXiv Detail & Related papers (2024-06-11T05:21:20Z)
CACA Agent: Capability Collaboration based AI Agent [18.84686313298908]
We propose CACA Agent (Capability Collaboration based AI Agent) using an open architecture inspired by service computing. CACA Agent integrates a set of collaborative capabilities to implement AI Agents, not only reducing the dependence on a single LLM. We present a demo to illustrate the operation and the application scenario extension of CACA Agent.
arXiv Detail & Related papers (2024-03-22T11:42:47Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
Toward Policy Explanations for Multi-Agent Reinforcement Learning [18.33682005623418]
We present novel methods to generate two types of policy explanations for MARL. Experimental results on three MARL domains demonstrate the scalability of our methods. A user study shows that the generated explanations significantly improve user performance and increase subjective ratings on metrics such as user satisfaction.
arXiv Detail & Related papers (2022-04-26T20:07:08Z)
Modelling Multi-Agent Epistemic Planning in ASP [66.76082318001976]
This paper presents an implementation of a multi-shot Answer Set Programming-based planner that can reason in multi-agent epistemic settings. The paper shows how the planner, exploiting an ad-hoc epistemic state representation and the efficiency of ASP solvers, has competitive performance results on benchmarks collected from the literature.
arXiv Detail & Related papers (2020-08-07T06:35:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.