ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models
- URL: http://arxiv.org/abs/2505.13176v2
- Date: Thu, 22 May 2025 14:08:07 GMT
- Title: ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models
- Authors: Zihao Cheng, Hongru Wang, Zeming Liu, Yuhang Guo, Yuanfang Guo, Yunhong Wang, Haifeng Wang,
- Abstract summary: We introduce ToolSpectrum, a benchmark designed to evaluate large language models' capabilities in personalized tool utilization.<n>We formalize two key dimensions of personalization, user profile and environmental factors, and analyze their individual and synergistic impacts on tool utilization.<n>Our findings underscore the necessity of context-aware personalization in tool-augmented LLMs and reveal critical limitations for current models.
- Score: 48.276461194773354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While integrating external tools into large language models (LLMs) enhances their ability to access real-time information and domain-specific services, existing approaches focus narrowly on functional tool selection following user instructions, overlooking the context-aware personalization in tool selection. This oversight leads to suboptimal user satisfaction and inefficient tool utilization, particularly when overlapping toolsets require nuanced selection based on contextual factors. To bridge this gap, we introduce ToolSpectrum, a benchmark designed to evaluate LLMs' capabilities in personalized tool utilization. Specifically, we formalize two key dimensions of personalization, user profile and environmental factors, and analyze their individual and synergistic impacts on tool utilization. Through extensive experiments on ToolSpectrum, we demonstrate that personalized tool utilization significantly improves user experience across diverse scenarios. However, even state-of-the-art LLMs exhibit the limited ability to reason jointly about user profiles and environmental factors, often prioritizing one dimension at the expense of the other. Our findings underscore the necessity of context-aware personalization in tool-augmented LLMs and reveal critical limitations for current models. Our data and code are available at https://github.com/Chengziha0/ToolSpectrum.
Related papers
- TAPS: Tool-Augmented Personalisation via Structured Tagging [0.7007504690449126]
This work investigates how user preferences can be effectively integrated into goal-oriented dialogue agents.<n>We introduce TAPS, a novel solution that enhances personalised tool use by leveraging a structured tagging tool and an uncertainty-based tool detector.
arXiv Detail & Related papers (2025-06-25T13:24:46Z) - Advancing and Benchmarking Personalized Tool Invocation for LLMs [66.39214525683425]
We introduce the concept of Personalized Tool Invocation and define two key tasks: Tool Preference and Profile-dependent Query.<n>To tackle these challenges, we propose PTool, a data synthesis framework designed for personalized tool invocation.<n>We construct textbfPTBench, the first benchmark for evaluating personalized tool invocation.
arXiv Detail & Related papers (2025-05-07T02:25:20Z) - PEToolLLM: Towards Personalized Tool Learning in Large Language Models [21.800332388883465]
We formulate the task of personalized tool learning, which integrates user's interaction history towards personalized tool usage.<n>We construct PEToolBench, featuring diverse user preferences reflected in interaction history under three distinct personalized settings.<n>We propose a framework PEToolLLaMA to adapt LLMs to the personalized tool learning task.
arXiv Detail & Related papers (2025-02-26T09:43:08Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.<n>Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER)
Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline.
Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z) - Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization.
It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving.
A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z) - ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [49.33633818046644]
We propose ToolEyes, a fine-grained system tailored for the evaluation of large language models' tool learning capabilities in authentic scenarios.<n>The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning.<n>ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world.
arXiv Detail & Related papers (2024-01-01T12:49:36Z) - MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [79.87054552116443]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.<n>We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.<n>We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.