Open Agent Specification (Agent Spec): A Unified Representation for AI Agents
- URL: http://arxiv.org/abs/2510.04173v4
- Date: Fri, 07 Nov 2025 14:02:33 GMT
- Title: Open Agent Specification (Agent Spec): A Unified Representation for AI Agents
- Authors: Soufiane Amini, Yassine Benajiba, Cesare Bernardis, Paul Cayet, Hassan Chafi, Abderrahim Fathan, Louis Faucon, Damien Hilloulin, Sungpack Hong, Ingo Kossyk, Tran Minh Son Le, Rhicheek Patra, Sujith Ravi, Jonas Schweizer, Jyotika Singh, Shailender Singh, Weiyi Sun, Kartik Talamadupula, Jerry Xu,
- Abstract summary: We introduce Open Agent Specification (Agent Spec), a declarative language that defines AI agents and agentic.<n>Agent Spec defines a common set of components, control and data flow semantics, and schemas that allow an agent to be defined once and executed across different runtimes.
- Score: 10.685555728094338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of agent frameworks has led to fragmentation in how agents are defined, executed, and evaluated. Existing systems differ in their abstractions, data flow semantics, and tool integrations, making it difficult to share or reproduce workflows. We introduce Open Agent Specification (Agent Spec), a declarative language that defines AI agents and agentic workflows in a way that is compatible across frameworks, promoting reusability, portability and interoperability of AI agents. Agent Spec defines a common set of components, control and data flow semantics, and schemas that allow an agent to be defined once and executed across different runtimes. Agent Spec also introduces a standardized Evaluation harness to assess agent behavior and agentic workflows across runtimes - analogous to how HELM and related harnesses standardized LLM evaluation - so that performance, robustness, and efficiency can be compared consistently across frameworks. We demonstrate this using four distinct runtimes (LangGraph, CrewAI, AutoGen, and WayFlow) evaluated over three different benchmarks (SimpleQA Verified, $\tau^2$-Bench and BIRD-SQL). We provide accompanying toolsets: a Python SDK (PyAgentSpec), a reference runtime (WayFlow), and adapters for popular frameworks (e.g., LangGraph, AutoGen, CrewAI). Agent Spec bridges the gap between model-centric and agent-centric standardization & evaluation, laying the groundwork for reliable, reusable, and portable agentic systems.
Related papers
- AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation [39.61543921719145]
AgentSelect is a benchmark that reframes agent selection as narrative query-to-agent recommendation.<n>It converts heterogeneous evaluation artifacts into unified, positive-only interaction data.<n>AgentSelect provides the first unified data and evaluation infrastructure for agent recommendation.
arXiv Detail & Related papers (2026-03-04T06:17:51Z) - ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads? [0.8749675983608171]
We introduce ISO-Bench, a benchmark for coding agents to test their capabilities on real-world inference tasks.<n>We curated 54 tasks from merged pull requests with measurable performance improvements.
arXiv Detail & Related papers (2026-02-23T08:37:53Z) - AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios [49.90735676070039]
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow.<n>We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks.<n>We propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks.
arXiv Detail & Related papers (2026-01-28T13:49:18Z) - Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL [0.15199492741752027]
We propose an Information-Flow-Orchestrated Multi-Agent Paradigm via Agent-to-Agent (A2A) Communication.<n>We evaluate our approach on the general-purpose benchmark GAIA, using the representative workflow-based MAS as the baseline.<n>Our method achieves 63.64% accuracy, outperforming OWL's 55.15% by 8.49 percentage points with comparable token consumption.
arXiv Detail & Related papers (2026-01-14T21:35:51Z) - Monadic Context Engineering [59.95390010097654]
This paper introduces Monadic Context Engineering (MCE) to provide a formal foundation for agent design.<n>We demonstrate how Monads enable robust composition, how Applicatives provide a principled structure for parallel execution, and crucially, how Monad Transformers allow for the systematic composition of these capabilities.<n>This layered approach enables developers to construct complex, resilient, and efficient AI agents from simple, independently verifiable components.
arXiv Detail & Related papers (2025-12-27T01:52:06Z) - AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z) - AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications [95.42093979627703]
AgentScope supports flexible and efficient tool-based agent-environment interactions.<n>We ground agent behaviors in the ReAct paradigm and offer advanced agent-level infrastructure.<n>AgentScope also includes robust engineering support for developer-friendly experiences.
arXiv Detail & Related papers (2025-08-22T10:35:56Z) - Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z) - Agent-as-a-Service based on Agent Network [9.5094423572869]
We propose Agent-as-a-Service based on Agent Network (A-AN), a service-oriented paradigm grounded in the Role-Goal-Process-Service (RGPS) standard.<n>A-AN unifies the entire agent lifecycle, including construction, integration, interoperability, and networked collaboration.<n>We release a dataset containing 10,000 long-horizon multi-agent to facilitate future research on long-chain collaboration in MAS.
arXiv Detail & Related papers (2025-05-13T11:15:19Z) - PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC [98.82146219495792]
In this paper, we propose a hierarchical agent framework named PC-Agent.<n>From the perception perspective, we devise an Active Perception Module (APM) to overcome the inadequate abilities of current MLLMs in perceiving screenshot content.<n>From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture.
arXiv Detail & Related papers (2025-02-20T05:41:55Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - DAWN: Designing Distributed Agents in a Worldwide Network [0.38447712214412116]
DAWN enables distributed agents worldwide to register and be easily discovered through Gateway Agents.<n>No-LLM Mode for deterministic tasks, Copilot for augmented decision-making, and LLM Agent for autonomous operations.<n>DAWN ensures the safety and security of agent collaborations globally through a dedicated safety, security, and compliance layer.
arXiv Detail & Related papers (2024-10-11T18:47:04Z) - Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [112.04307762405669]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.<n>G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z) - Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism.
The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z) - AutoAgents: A Framework for Automatic Agent Generation [27.74332323317923]
AutoAgents is an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks.
Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods.
arXiv Detail & Related papers (2023-09-29T14:46:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.