AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
- URL: http://arxiv.org/abs/2603.03761v1
- Date: Wed, 04 Mar 2026 06:17:51 GMT
- Title: AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
- Authors: Yunxiao Shi, Wujiang Xu, Tingwei Chen, Haoning Shang, Ling Yang, Yunfeng Wan, Zhuo Cao, Xing Zi, Dimitris N. Metaxas, Min Xu,
- Abstract summary: AgentSelect is a benchmark that reframes agent selection as narrative query-to-agent recommendation.<n>It converts heterogeneous evaluation artifacts into unified, positive-only interaction data.<n>AgentSelect provides the first unified data and evaluation infrastructure for agent recommendation.
- Score: 39.61543921719145
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM leaderboards and tool/agent benchmarks evaluate components in isolation and remain fragmented across tasks, metrics, and candidate pools, leaving a critical research gap: there is little query-conditioned supervision for learning to recommend end-to-end agent configurations that couple a backbone model with a toolkit. We address this gap with AgentSelect, a benchmark that reframes agent selection as narrative query-to-agent recommendation over capability profiles and systematically converts heterogeneous evaluation artifacts into unified, positive-only interaction data. AgentSelectcomprises 111,179 queries, 107,721 deployable agents, and 251,103 interaction records aggregated from 40+ sources, spanning LLM-only, toolkit-only, and compositional agents. Our analyses reveal a regime shift from dense head reuse to long-tail, near one-off supervision, where popularity-based CF/GNN methods become fragile and content-aware capability matching is essential. We further show that Part~III synthesized compositional interactions are learnable, induce capability-sensitive behavior under controlled counterfactual edits, and improve coverage over realistic compositions; models trained on AgentSelect also transfer to a public agent marketplace (MuleRun), yielding consistent gains on an unseen catalog. Overall, AgentSelect provides the first unified data and evaluation infrastructure for agent recommendation, which establishes a reproducible foundation to study and accelerate the emerging agent ecosystem.
Related papers
- Learning to Recommend Multi-Agent Subgraphs from Calling Trees [6.247621896325622]
Multi-agent systems (MAS) increasingly solve complex tasks by orchestrating agents and tools selected from rapidly growing marketplaces.<n>We introduce a generic textbfconstrained recommendation framework that first uses retrieval to build a compact candidate set conditioned on the current subtask and context.<n>We ground both the formulation and learning signals in textbfhistorical calling trees, which capture the execution structure of MAS.
arXiv Detail & Related papers (2026-01-29T18:26:12Z) - AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z) - AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z) - Emergent Coordination in Multi-Agent Language Models [2.504366738288215]
We introduce an information-theoretic framework to test whether multi-agent systems show signs of higher-order structure.<n>This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems.<n>We apply our framework to experiments using a simple guessing game without direct agent communication.
arXiv Detail & Related papers (2025-10-05T11:26:41Z) - Stochastic Self-Organization in Multi-Agent Systems [28.70691568233268]
Multi-agent systems (MAS) based on Large Language Models (LLMs) have the potential to solve tasks that are beyond the reach of any single LLM.<n>We introduce a response-conditioned framework that adapts communication on-the-fly.
arXiv Detail & Related papers (2025-10-01T09:08:04Z) - MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection [84.75972919995398]
This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles.<n>The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent, and (iv) a web-scraped data analyzer.<n>Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches.
arXiv Detail & Related papers (2025-08-13T19:14:48Z) - AgentOrchestra: Orchestrating Hierarchical Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol [22.406849007798858]
We propose the Tool-Environment-Agent Protocol to integrate environments, agents, and tools into an unified system.<n>We introduce AgentOrchestra, a hierarchical multi-agent framework with a central planning agent that decomposes complex objectives and coordinates specialized agents.
arXiv Detail & Related papers (2025-06-14T13:45:37Z) - MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification [5.666070277424383]
MAG-V is a framework to generate a dataset of questions that mimic customer queries.<n>Our synthetic data can improve agent performance on actual customer queries.
arXiv Detail & Related papers (2024-11-28T19:36:11Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.