TroVE: Inducing Verifiable and Efficient Toolboxes for Solving
Programmatic Tasks
- URL: http://arxiv.org/abs/2401.12869v1
- Date: Tue, 23 Jan 2024 16:03:17 GMT
- Title: TroVE: Inducing Verifiable and Efficient Toolboxes for Solving
Programmatic Tasks
- Authors: Zhiruo Wang, Daniel Fried, Graham Neubig
- Abstract summary: Language models (LMs) can solve tasks such as answering questions about tables or images by writing programs.
To enable better solutions without human labor, we ask code LMs to curate reusable high-level functions.
We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions.
- Score: 75.1781376169951
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Language models (LMs) can solve tasks such as answering questions about
tables or images by writing programs. However, using primitive functions often
leads to verbose and error-prone programs, and higher-level functions require
expert design. To enable better solutions without human labor, we ask code LMs
to curate reusable high-level functions, and use them to write solutions. We
present TROVE, a training-free method of inducing a verifiable and efficient
toolbox of functions, by generating via using, growing, and periodically
trimming the toolbox. On 11 datasets from math, table question answering, and
image reasoning tasks, TROVE consistently yields simpler solutions with higher
accuracy than baselines using CODELLAMA and previous methods using GPT, while
using 79-98% smaller toolboxes. TROVE further enables 31% faster and 13% more
accurate human verification than baselines. With the same pipeline, it creates
diverse functions for varied tasks and datasets, providing insights into their
individual characteristics.
Related papers
- Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling.
It focuses on specific logical and mathematical reasoning tasks.
The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z) - Efficient and Scalable Estimation of Tool Representations in Vector Space [34.767193045989515]
We present a framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models.
We create ToolBank, a new tool retrieval dataset that reflects real human user usages.
With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank.
arXiv Detail & Related papers (2024-09-02T19:39:24Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.
There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.
We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks.
Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.