LLM Agents Making Agent Tools
- URL: http://arxiv.org/abs/2502.11705v1
- Date: Mon, 17 Feb 2025 11:44:11 GMT
- Title: LLM Agents Making Agent Tools
- Authors: Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelović, Jakob Nikolas Kather,
- Abstract summary: Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks.
We propose ToolMaker, a novel agentic framework that autonomously transforms papers with code into LLM-compatible tools.
Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task.
- Score: 2.5529148902034637
- License:
- Abstract: Tool use has turned large language models (LLMs) into powerful agents that can perform complex multi-step tasks by dynamically utilising external software components. However, these tools must be implemented in advance by human developers, hindering the applicability of LLM agents in domains which demand large numbers of highly specialised tools, like in life sciences and medicine. Motivated by the growing trend of scientific studies accompanied by public code repositories, we propose ToolMaker, a novel agentic framework that autonomously transforms papers with code into LLM-compatible tools. Given a short task description and a repository URL, ToolMaker autonomously installs required dependencies and generates code to perform the task, using a closed-loop self-correction mechanism to iteratively diagnose and rectify errors. To evaluate our approach, we introduce a benchmark comprising 15 diverse and complex computational tasks spanning both medical and non-medical domains with over 100 unit tests to objectively assess tool correctness and robustness. ToolMaker correctly implements 80% of the tasks, substantially outperforming current state-of-the-art software engineering agents. ToolMaker therefore is a step towards fully autonomous agent-based scientific workflows.
Related papers
- Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.
MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.
Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models [49.04652315815501]
Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools.
We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task.
arXiv Detail & Related papers (2025-02-17T03:42:28Z) - ToolGen: Unified Tool Retrieval and Calling via Generation [34.34787641393914]
We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the large language models' parameters.
We show that ToolGen achieves superior results in both tool retrieval and autonomous task completion.
ToolGen paves the way for more versatile, efficient, and autonomous AI systems.
arXiv Detail & Related papers (2024-10-04T13:52:32Z) - Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems.
Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation.
Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance and low cost.
arXiv Detail & Related papers (2024-07-01T17:24:45Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - From Language Models to Practical Self-Improving Computer Agents [0.8547032097715571]
We develop a methodology to create AI computer agents that can carry out diverse computer tasks and self-improve.
We prompt an LLM agent to augment itself with retrieval, internet search, web navigation, and text editor capabilities.
The agent effectively uses these various tools to solve problems including automated software development and web-based tasks.
arXiv Detail & Related papers (2024-04-18T07:50:10Z) - SciAgent: Tool-augmented Language Models for Scientific Reasoning [129.51442677710452]
We introduce a new task setting named tool-augmented scientific reasoning.
This setting supplements Large Language Models with scalable toolsets.
We construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools.
Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving.
arXiv Detail & Related papers (2024-02-18T04:19:44Z) - Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization.
It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving.
A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.