An LLM Compiler for Parallel Function Calling
- URL: http://arxiv.org/abs/2312.04511v3
- Date: Wed, 5 Jun 2024 03:53:10 GMT
- Title: An LLM Compiler for Parallel Function Calling
- Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami,
- Abstract summary: We introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls.
We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to 9% compared to ReAct.
- Score: 68.04566807806071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. Drawing inspiration from the principles of classical compilers, LLMCompiler enables parallel function calling with three components: (i) a Function Calling Planner, formulating execution plans for function calling; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct. Our code is available at https://github.com/SqueezeAILab/LLMCompiler.
Related papers
- Achieving Tool Calling Functionality in LLMs Using Only Prompt Engineering Without Fine-Tuning [0.0]
Currently, the vast majority of locally deployed open-source large language models (LLMs) and some commercial model interfaces do not support stable tool calling functionality.
This paper proposes a method that enables LLMs to achieve stable tool calling capabilities using only prompt engineering and some ingenious code design.
arXiv Detail & Related papers (2024-07-06T08:29:12Z) - Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks [35.97890508648945]
We introduce the-20B-FUNCTIONCALLING model under an Apache 2.0 license.
The model is trained using a multi-task training approach on seven fundamental tasks.
We show that-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
arXiv Detail & Related papers (2024-06-27T17:47:26Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce Bench, a benchmark that challenges Large Language Models to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts [21.819126948549766]
Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts.
APPL acts as a bridge between computer programs and LLMs, allowing seamless embedding of prompts into Python functions.
arXiv Detail & Related papers (2024-06-19T02:29:59Z) - An LLM-Tool Compiler for Fused Parallel Function Calling [1.990293258268139]
State-of-the-art sequential reasoning in Large Language Models (LLMs) has expanded the capabilities of Copilots beyond conversational tasks to complex function calling.
We propose LLM-Tool Compiler, which fuses similar types of tool operations under a single function at runtime, presenting them as a unified task to the LLM.
Benchmarked on a large-scale Copilot platform, LLM-Tool Compiler achieves up to four times more parallel calls than existing methods, reducing token costs and latency by up to 40% and 12%, respectively.
arXiv Detail & Related papers (2024-05-07T18:55:50Z) - InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM.
InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention.
Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z) - PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task
Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions.
We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence.
The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z) - Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning [8.96245399645571]
This paper introduces Reverse Chain'', a controllable, target-driven approach to empower Large Language Models with the capability to operate external APIs only via prompts.
To manage a controllable multi-function calling, Reverse Chain adopts a generic rule based on a backward reasoning process.
arXiv Detail & Related papers (2023-10-06T05:20:18Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM.
It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses.
We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.