Related papers: DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task

DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task

URL: http://arxiv.org/abs/2602.11198v1
Date: Tue, 03 Feb 2026 01:10:59 GMT
Title: DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task
Authors: Shafiuddin Rehan Ahmed, Wei Wei,
Abstract summary: DDL2PropBank is a novel benchmark task that maps relational database schemas to PropBank rolesets.<n>We implement identical agent logic across 10 frameworks and evaluate along two dimensions: (i) code complexity via static analysis, and (ii) AI-assistability.<n>Our results reveal a threefold complexity spectrum, with Pydantic AI and Agno requiring the least implementation overhead.
Score: 9.51787137194505
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Multi-agent frameworks promise to simplify LLM-driven software development, yet there is no principled way to evaluate their developer experience in a controlled setting. We introduce DDL2PropBank, a novel benchmark task that maps relational database schemas to PropBank rolesets, requiring autonomous retrieval of candidate frames and fine-grained linguistic reasoning over table names, columns, and relations. Using the Agent-as-a-Tool pattern, we implement identical agent logic across 10 frameworks and evaluate along two dimensions: (i) code complexity via static analysis, and (ii) AI-assistability -- the extent to which LLMs can autonomously generate correct, framework-specific code. Our results reveal a threefold complexity spectrum, with Pydantic AI and Agno requiring the least implementation overhead. For AI-assistability, structural alignment scores reliably proxy runtime success for frameworks with single canonical patterns, but overestimate correctness for multi-pattern frameworks. Agno emerges as the strongest overall performer, combining lowest complexity with highest structural alignment and 83% pass@1.

Related papers

Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z)
ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development [49.63491095660809]
ProjDevBench is an end-to-end benchmark that provides project requirements to coding agents and evaluates the resulting repositories.<n>We curate 20 programming problems across 8 categories, covering both concept-oriented tasks and real-world application scenarios.<n>Our evaluation reports an overall acceptance rate of 27.38%: agents handle basic functionality but struggle with complex system design, time optimization, and resource management.
arXiv Detail & Related papers (2026-02-02T05:17:23Z)
A Lightweight Modular Framework for Constructing Autonomous Agents Driven by Large Language Models: Design, Implementation, and Applications in AgentForge [1.932555230783329]
Lightweight, open-source Python framework designed to democratize the construction of LLM-driven autonomous agents.<n>AgentForge introduces three key innovations: (1) a composable skill abstraction that enables fine-grained task decomposition with formally defined input-output contracts, (2) a unified backend interface supporting seamless switching between cloud-based APIs and local inference engines, and (3) a declarative YAML-based configuration system that separates agent logic from implementation details.
arXiv Detail & Related papers (2026-01-19T20:33:26Z)
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z)
LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science [69.1690891731311]
We propose a novel multi-agent communication paradigm inspired by the blackboard architecture for traditional AI models.<n>In this framework, a central agent posts requests to a shared blackboard, and autonomous subordinate agents respond based on their capabilities.<n>We evaluate our method on three benchmarks that require explicit data discovery.
arXiv Detail & Related papers (2025-09-30T22:34:23Z)
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z)
CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs [16.234259194402163]
We introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems.<n>Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines.
arXiv Detail & Related papers (2025-07-04T02:20:19Z)
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search [58.98450205734779]
Large language model (LLM) agents have demonstrated strong capabilities across diverse domains.<n>Existing agent search methods suffer from three major limitations.<n>We introduce a comprehensive framework to address these challenges.
arXiv Detail & Related papers (2025-06-06T12:07:23Z)
Comparative Analysis of AI Agent Architectures for Entity Relationship Classification [1.6887793771613606]
In this study, we conduct a comparative analysis of three distinct AI agent architectures to perform relation classification.<n>The agentic architectures explored include (1) reflective self-evaluation, (2) hierarchical task decomposition, and (3) a novel multi-agent dynamic example generation mechanism.<n>Our experiments demonstrate that multi-agent coordination consistently outperforms standard few-shot prompting.
arXiv Detail & Related papers (2025-06-03T04:19:47Z)
AdaCoder: An Adaptive Planning and Multi-Agent Framework for Function-Level Code Generation [17.020112052995334]
A typical multi-agent framework consists of Large Language Model (LLM)-based agents.<n>AdaCoder is a novel adaptive planning, multi-agent framework for function-level code generation.
arXiv Detail & Related papers (2025-04-05T16:14:01Z)
Text2Schema: Filling the Gap in Designing Database Table Structures based on Natural Language [22.15408079332362]
People without a database background usually rely on file systems or tools such as Excel data management.<n> Database systems possess strong management capabilities, but require a high level of professional expertise from users.
arXiv Detail & Related papers (2025-03-31T09:39:19Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [66.43738008739555]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction [10.65417796726349]
relation extraction (RE) in complex scenarios faces challenges such as diverse relation types and ambiguous relations between entities within a single sentence. We propose an agent-based RE framework, namely AgentRE, which fully leverages the potential of large language models to achieve RE in complex scenarios.
arXiv Detail & Related papers (2024-09-03T12:53:05Z)
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.