Related papers: Gaming Tool Preferences in Agentic LLMs

Gaming Tool Preferences in Agentic LLMs

URL: http://arxiv.org/abs/2505.18135v1
Date: Fri, 23 May 2025 17:43:48 GMT
Title: Gaming Tool Preferences in Agentic LLMs
Authors: Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi,
Abstract summary: Large language models (LLMs) can now access a wide range of external tools.<n>LLMs rely entirely on the text descriptions of tools to decide which ones to use.
Score: 43.391777105480486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can now access a wide range of external tools, thanks to the Model Context Protocol (MCP). This greatly expands their abilities as various agents. However, LLMs rely entirely on the text descriptions of tools to decide which ones to use--a process that is surprisingly fragile. In this work, we expose a vulnerability in prevalent tool/function-calling protocols by investigating a series of edits to tool descriptions, some of which can drastically increase a tool's usage from LLMs when competing with alternatives. Through controlled experiments, we show that tools with properly edited descriptions receive over 10 times more usage from GPT-4.1 and Qwen2.5-7B than tools with original descriptions. We further evaluate how various edits to tool descriptions perform when competing directly with one another and how these trends generalize or differ across a broader set of 10 different models. These phenomenons, while giving developers a powerful way to promote their tools, underscore the need for a more reliable foundation for agentic LLMs to select and utilize tools and resources.

Related papers

RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation [44.128974924517465]
RefTool is a reference-guided framework for automatic tool creation.<n>It generates executable tools from reference content, validate them using illustrative examples, and organize them hierarchically into a toolbox.<n> Experiments on causality, physics, and chemistry benchmarks demonstrate that RefTool outperforms existing tool-creation and domain-specific reasoning methods.
arXiv Detail & Related papers (2025-05-27T16:41:19Z)
PTR: Precision-Driven Tool Recommendation for Large Language Models [43.53494041932615]
We propose a Precision-driven Tool Recommendation (PTR) approach for Large Language Models (LLMs) PTR captures an initial, concise set of tools by leveraging historical tool bundle usage and dynamically adjusts the tool set by performing tool matching. We present a new dataset, RecTools, and a metric, TRACC, designed to evaluate the effectiveness of tool recommendation for LLMs.
arXiv Detail & Related papers (2024-11-14T17:33:36Z)
Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents [56.822238860147024]
Augmenting large language models with external tools has emerged as a promising approach to extend their utility.<n>Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning.<n>We propose AutoTools, a framework that enables LLMs to automate the tool-use workflow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z)
Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning [57.523454568002144]
Large language models (LLMs) have shown capabilities in commonsense reasoning and leveraging external tools. We introduce ToolRec, a framework for LLM-empowered recommendations via tool learning. We formulate the recommendation process as a process aimed at exploring user interests in attribute granularity. We consider two types of attribute-oriented tools: rank tools and retrieval tools.
arXiv Detail & Related papers (2024-05-24T00:06:54Z)
What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks. We provide a unified definition of tools as external programs used by LMs. We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z)
Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER) Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline. Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z)
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction. It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z)
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [79.87054552116443]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.<n>We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.<n>We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.