An LLM-Tool Compiler for Fused Parallel Function Calling
- URL: http://arxiv.org/abs/2405.17438v1
- Date: Tue, 7 May 2024 18:55:50 GMT
- Title: An LLM-Tool Compiler for Fused Parallel Function Calling
- Authors: Simranjit Singh, Andreas Karatzas, Michael Fore, Iraklis Anagnostopoulos, Dimitrios Stamoulis,
- Abstract summary: State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling.
We propose LLM-Tool Compiler, which fuses similar types of tool operations under a single function at runtime, presenting them as a unified task to the LLM.
Benchmarked on a large-scale Copilot platform, LLM-Tool Compiler achieves up to four times more parallel calls than existing methods, reducing token costs and latency by up to 40% and 12%, respectively.
- Score: 1.990293258268139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling, managing thousands of API calls. However, the tendency of compositional prompting to segment tasks into multiple steps, each requiring a round-trip to the GPT APIs, leads to increased system latency and costs. Although recent advancements in parallel function calling have improved tool execution per API call, they may necessitate more detailed in-context instructions and task breakdown at the prompt level, resulting in higher engineering and production costs. Inspired by the hardware design principles of multiply-add (MAD) operations, which fuse multiple arithmetic operations into a single task from the compiler's perspective, we propose LLM-Tool Compiler, which selectively fuses similar types of tool operations under a single function at runtime, presenting them as a unified task to the LLM. This selective fusion inherently enhances parallelization and efficiency. Benchmarked on a large-scale Copilot platform, LLM-Tool Compiler achieves up to four times more parallel calls than existing methods, reducing token costs and latency by up to 40% and 12%, respectively.
Related papers
- Less is More: Optimizing Function Calling for LLM Execution on Edge Devices [0.44784055850794474]
Large Language Models (LLMs) struggle with function calling at the edge because they cannot handle complex inputs or manage multiple tools effectively.
We introduce Less-is-More, a novel fine-tuning-free function-calling scheme for dynamic tool selection.
Our approach is based on the key insight that selectively reducing the number of tools available to LLMs significantly improves their function-calling performance, execution time, and power efficiency on edge devices.
arXiv Detail & Related papers (2024-11-23T00:51:09Z) - Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making.
Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance.
We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts [21.819126948549766]
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts.
APPL acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions.
arXiv Detail & Related papers (2024-06-19T02:29:59Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - An LLM Compiler for Parallel Function Calling [68.04566807806071]
We introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls.
We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to 9% compared to ReAct.
arXiv Detail & Related papers (2023-12-07T18:32:04Z) - PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task
Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions.
We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence.
The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z) - Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning [8.96245399645571]
This paper introduces Reverse Chain'', a controllable, target-driven approach to empower Large Language Models with the capability to operate external APIs only via prompts.
To manage a controllable multi-function calling, Reverse Chain adopts a generic rule based on a backward reasoning process.
arXiv Detail & Related papers (2023-10-06T05:20:18Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.