Related papers: OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

URL: http://arxiv.org/abs/2502.11271v1
Date: Sun, 16 Feb 2025 21:18:47 GMT
Title: OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
Authors: Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, James Zou,
Abstract summary: OctoTools is a training-free, user-friendly, and easily open-source agentic framework designed to tackle complex reasoning across diverse domains.<n>We validate OctoTools' generality across 16 diverse tasks, achieving substantial average accuracy gains of 9.3% over GPT-4o.<n> OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools.
Score: 47.51937366171448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.

Related papers

ToolGen: Unified Tool Retrieval and Calling via Generation [34.34787641393914]
We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the large language models' parameters. We show that ToolGen achieves superior results in both tool retrieval and autonomous task completion. ToolGen paves the way for more versatile, efficient, and autonomous AI systems.
arXiv Detail & Related papers (2024-10-04T13:52:32Z)
MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation [25.360660222418183]
We present MetaTool, a novel tool learning methodology designed to generalize across any reusable toolset. By incorporating meta-task data into task-oriented training, our method significantly enhances the performance of open-source Large Language Models.
arXiv Detail & Related papers (2024-07-15T10:15:41Z)
Tool-Planner: Task Planning with Clusters across Multiple Tools [29.278169900986434]
We propose Tool-Planner, a task-processing framework based on toolkits. Tool-Planner groups tools based on the API functions with the same function into a toolkit. When a tool error occurs, the language model can reselect and adjust tools based on the toolkit.
arXiv Detail & Related papers (2024-06-06T07:30:14Z)
SciAgent: Tool-augmented Language Models for Scientific Reasoning [129.51442677710452]
We introduce a new task setting named tool-augmented scientific reasoning. This setting supplements Large Language Models with scalable toolsets. We construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving.
arXiv Detail & Related papers (2024-02-18T04:19:44Z)
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [93.68764280953624]
UltraTool is a novel benchmark designed to improve and evaluate Large Language Models' ability in tool utilization. It emphasizes real-world complexities, demanding accurate, multi-step planning for effective problem-solving. A key feature of UltraTool is its independent evaluation of planning with natural language, which happens before tool usage.
arXiv Detail & Related papers (2024-01-30T16:52:56Z)
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction. It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z)
ToolTalk: Evaluating Tool-Usage in a Conversational Setting [6.792842055445584]
This paper introduces ToolTalk, a benchmark consisting of complex user intents requiring multi-step tool usage specified through dialogue. ToolTalk contains 28 tools grouped into 7 plugins, and includes a complete simulated implementation of each tool. We evaluate GPT-3.5 and GPT-4 on ToolTalk resulting in success rates of 26% and 50% respectively.
arXiv Detail & Related papers (2023-11-15T23:50:31Z)
ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks. Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z)
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [79.87054552116443]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools. We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z)
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [25.5476046472217]
Augmenting large language models with external tools has emerged as a promising approach to solving complex problems. Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations. We propose an alternative approach, $textbfToolkenGPT$, which combines the benefits of both sides.
arXiv Detail & Related papers (2023-05-19T09:54:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.